Avoiding Docker socket access for a backup container that needs to stop/start Directus during restore

I’m building a backup & restore extension for Directus that runs as a sidecar container alongside Directus (see my showcase post for context).

The problem

During a full restore, the backup container needs to stop Directus before running pg_restore, then start it again afterward. The reason: if Directus keeps running during pg_restore, it holds open database connections that interfere with the schema wipe (DROP SCHEMA public CASCADE) and the subsequent restore. This can lead to data corruption or failed restores.

Currently, I solve this by mounting the Docker socket (/var/run/docker.sock) into the backup container and using Dockerode to stop() and start() the Directus container. It works, but giving a sidecar container access to the Docker socket is a security concern — it effectively grants control over the entire Docker host.

An alternative I’m considering: PostgreSQL-level lockout

Instead of stopping the Directus container via Docker, I could lock out the Directus database user at the PostgreSQL level:

  1. ALTER USER directus NOLOGIN; — prevents any new DB connections
  2. SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE usename = 'directus'; — kills existing connections
  3. Directus loses its DB connection, crashes, and Docker’s restart policy tries to bring it back — but login is denied, so it keeps crashing
  4. pg_restore runs uninterrupted (using a separate backup_admin DB superuser)
  5. ALTER USER directus LOGIN; — re-enables login
  6. Docker’s next restart attempt succeeds, Directus boots normally

This requires a second PostgreSQL user (e.g. backup_admin with SUPERUSER) so the backup service doesn’t lock itself out.

Known trade-offs and open concerns

Restart backoff delay: Docker’s unless-stopped restart policy uses exponential backoff when a container keeps crashing. While Directus is locked out and repeatedly failing to connect, the backoff grows (100ms → 200ms → 400ms → … up to ~2 min). By the time LOGIN is re-enabled after a 30–60s restore, Docker might be waiting 30–60 seconds before the next restart attempt. There is no way to reset this backoff without Docker socket access — so total downtime could be noticeably longer compared to an explicit docker start.

UI workflow compatibility: The extension currently lets you trigger a full restore directly from Directus Studio. The UI shows a fullscreen loading overlay, polls the backup container’s health endpoint, and reloads once Directus is back. With the Docker socket approach, Directus comes back quickly and predictably after restore. With the lockout approach, the unpredictable restart delay makes the UX less smooth — the user stares at the loading screen longer, with no way to tell whether it’s “still restoring” or “waiting for Docker to retry”.

Error recovery: If something goes wrong mid-restore and the backup service crashes before re-enabling LOGIN, Directus stays permanently locked out. With the Docker socket approach, a stopped container can at least be manually started. A stuck NOLOGIN requires someone to connect to PostgreSQL directly and run ALTER USER directus LOGIN; — which is less obvious to troubleshoot.

Questions

  • Has anyone solved a similar problem (controlling a sibling container without Docker socket access)?
  • Does the PostgreSQL lockout approach sound reasonable despite the trade-offs above, or are there pitfalls I’m missing?
  • Is there a way to reset Docker’s restart backoff for a container without socket access?
  • Any other ideas I haven’t considered?

Thanks for any input!