Deploying 4 Services to GCE With 2 Hours Left in the Hackathon

It's 7 AM. The hackathon deadline is in 2 hours. I just discovered that LightPanda — the headless browser my agent uses — has no graphic rendering engine. It can navigate the DOM, but it cannot see. The competition requires multimodal analysis. If I can't get Playwright back into the Docker image in time, we fail the criteria entirely.

The 2-hour deployment constraint

G.O.L.E.M. is an autonomous security agent I built for Google's Gemini Live Agent Challenge. It uses Gemini 3 Flash via ADK-Go to find business-logic vulnerabilities in web applications. The competition required agents to be "hosted on Google Cloud."

The deployment work happened on March 16, 2026, the competition deadline. It was 8 AM on my side (GMT+8) and I'd been up all night. From first deploy attempt to first success: 2 hours and 18 minutes, 11 failed workflow runs, 8 fix PRs merged.

Why GCE instead of Cloud Run? Two reasons: cost and familiarity. In a previous project (Supacrawler), I deployed a similar multi-service stack using Docker Swarm on a VPS. That experience taught me how to structure deploy workflows from scratch. GCE with Docker Compose on a single VM was the same pattern — cheaper (~$25/month on free credits) and something I could debug at 7 AM on zero sleep.

Four services, one VM, Traefik in front:

Loading diagram...

Only the observer gets a Traefik label for external routing. The agent, scraper, demo target, and Redis are internal — Docker network only, no port mapping to the host.

deploy.yml

build-service.yml

cleanup-gar.yml

ci.yml

docker-compose.prod.yml

.env.prod.example

The services share data through Docker volumes — the scraper writes screenshots, the agent writes trace files, and the observer reads both. This is also why it's Docker Compose and not individual containers: the volume mounts create a data dependency between services.

Partial deploys on a 4-service stack

Every push to main triggers a 4-stage pipeline:

1. Detect changes — dorny/paths-filter@v3 checks which apps/* directories changed. If triggered via workflow_dispatch (manual), all services are marked changed for a full rebuild.

2. Build (parallel, conditional) — Four build jobs run in parallel, each conditional on its service having changes. Each uses a reusable build-service.yml workflow: WIF auth, Docker Buildx with GitHub Actions layer cache, push to Google Artifact Registry with a 7-char SHA tag plus latest. If only the observer changed, only the observer builds. The other three skip entirely.

3. Cleanup (parallel, after builds) — After each build, the corresponding cleanup job prunes old images from Artifact Registry. The retention policy keeps the 3 most recent versions (configurable via keep_recent_versions). This means if a deploy goes wrong, I can SSH in and roll back to either of the previous two images immediately. The latest tag and the current deploy SHA are always protected from deletion.

4. Deploy — SSHs into the VM, copies docker-compose.prod.yml, writes .env.prod from GitHub secrets, pulls only the images that changed, runs docker compose up -d. The key piece is a resolve_tag function that handles partial deploys — if only the observer changed, the golem image stays at its current running tag:

Bash

resolve_tag() {
  local svc="$1" changed="$2"
  if [ "$changed" = "true" ]; then
    echo "$SHA"
  else
    local current
    current=$(docker inspect --format='{{.Config.Image}}' \
      "golem-${svc}-1" 2>/dev/null | awk -F: '{print $NF}') || true
    echo "${current:-latest}"
  fi
}

This avoids pulling all four images on every push. A push that only touches the observer frontend takes under 3 minutes end-to-end. Conditional builds are worth the complexity — a 5-line function saved minutes per push for 3 out of 4 services.

CI identity vs runtime identity

This is the part I see people get wrong, and I got wrong at first too. There are two service accounts serving two different auth patterns:

golem-sa — for the GitHub Actions runner (keyless via WIF)

The runner pushes images to Artifact Registry. Instead of a stored JSON key, it uses Workload Identity Federation: GitHub's OIDC token is exchanged for a short-lived GCP credential, scoped to the repository:

Bash

gcloud iam workload-identity-pools providers create-oidc github-provider \
  --workload-identity-pool=golem-github-pool \
  --issuer-uri=https://token.actions.githubusercontent.com \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
  --attribute-condition="assertion.repository=='antoineross/golem'"

A fork cannot use this identity. The credential expires with the job. golem-sa only has roles/artifactregistry.writer.

vps-gar-puller — for the VM itself (JSON key)

The VM pulls images from GAR, but it has no OIDC identity — it's just an SSH session from GitHub Actions. So it uses a traditional service account JSON key with roles/artifactregistry.reader. The key is stored as a GitHub secret (GCP_SA_KEY), SCP'd to the VM at deploy time, used for gcloud auth activate-service-account, then deleted.

GitHub Actions (push to GAR)  →  WIF, no stored key
GCE VM (pull from GAR)         →  SA key, ephemeral on disk

One non-obvious gotcha: WIF requires iamcredentials.googleapis.com to be enabled on the GCP project. It's not enabled by default. The error you get is a generic 403 that looks like an auth misconfiguration, not a missing API. Run gcloud services enable iamcredentials.googleapis.com before any WIF setup. When a credential error isn't about credentials, check the API surface.

At around 7 AM, with the competition deadline hours away, I discovered that LightPanda — the headless browser I was using for the scraper — has no graphic rendering engine. It's a DOM-only headless browser. LightPanda is excellent for what it does: fast, tiny (24.6 MB image vs 1.51 GB for Playwright), avoids bot detection. But it can navigate a page without ever seeing it. Screenshot capture is architecturally impossible — there are no pixels to capture.

This was a problem. The competition's multimodal criteria required the agent to analyze visual page state. Gemini needed to see what the page looked like, not just read its DOM. Without screenshots, we'd fail that criteria entirely.

The fix was a hybrid architecture: keep LightPanda for browsing (DOM scraping, navigation, form filling) and add Playwright specifically for screenshots. This meant the scraper Dockerfile went from a clean Go binary on debian:bookworm-slim to something considerably larger:

Dockerfile

# Install Node.js for the Playwright screenshot script
RUN apt-get update && apt-get install -y --no-install-recommends \
  ca-certificates tzdata curl netcat-openbsd gnupg \
  && curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
    | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg \
  && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] \
    https://deb.nodesource.com/node_22.x nodistro main" \
    > /etc/apt/sources.list.d/nodesource.list \
  && apt-get update && apt-get install -y nodejs

# Install Playwright + Chromium (this is the expensive line)
RUN npm install -g playwright && npx playwright install --with-deps chromium

The Go scraper calls a Node.js script (playwright-screenshot.js) that launches Chromium, navigates to the URL, takes the screenshot, and returns the result as JSON to stdout. The scraper then saves it to the shared volume.

Getting Playwright to run in a Docker container is notoriously painful — the Chromium dependency tree is large and version-sensitive, and most examples online ship bloated 2 GB images. But I'd already solved this problem in Supacrawler, where I spent weeks trimming the Playwright Docker setup. Having a working reference Dockerfile meant this took 20 minutes instead of potentially blowing the deadline. Prior infra work compounds — reference solutions reduce deadline risk more than raw speed. (PR #69)

The scraper now runs both engines: LightPanda serves WebSocket CDP connections for browsing, while Playwright handles screenshot requests. The SCREENSHOT_ENGINE environment variable controls the default.

The failures that mattered

Not all 11 failed deploys are worth discussing. Most were small fixes — wrong Go version in the Dockerfile, a typo in a path. Here are the ones that taught something transferable:

GitHub Actions startup_failure with zero logs. The workflow didn't start. No jobs ran, no error message, just "this workflow could not be started." The caller workflow (deploy.yml) was missing permissions: id-token: write. Without it, the reusable build-service.yml couldn't request an OIDC token for WIF, and GitHub killed the run at startup. The fix was one line, but finding it required reading GitHub's docs on inherited permissions for reusable workflows. I wrote scripts/check-workflow-permissions.sh as a pre-commit hook to catch this class of bug going forward. (PR #55, PR #56)

Wrong binary as Docker entrypoint. The golem service has two binaries: cmd/golem (a one-shot CLI agent) and cmd/server (an HTTP server with /api/health, /api/run, /api/status). The Dockerfile was building and running cmd/golem, which fires a hardcoded Gemini prompt on startup, hits 429 rate limits, exits, gets restarted by Docker, hits 429 again — infinite loop. The healthcheck on :8080 never passed because there was no HTTP server. The fix was building both binaries and making cmd/server the entrypoint, with cmd/golem spawned as a subprocess via the API. This is an architecture problem disguised as a rate limit problem. (PR #66)

Overlapping read-only volume mounts. The observer mounted screenshot-data at /data/tmp:ro and trace-data at /data/tmp/tests:ro. Docker can't create a mountpoint inside a read-only parent: OCI runtime create failed: read-only file system. Non-overlapping paths fixed it. (PR #67)

Deploy first, debug later

Features fail fast. Infrastructure fails silently. That difference determines build order.

I built G.O.L.E.M. in one day — agent, scraper, observer UI, demo target, and deployment pipeline. I started the deployment work about 4 hours before the competition deadline. That was a mistake.

A missing permissions: id-token: write doesn't throw an error — it kills the workflow at startup with no log output. A wrong entrypoint doesn't crash — it loops silently. Overlapping volume mounts don't warn — they fail at container creation with a cryptic OCI message. These problems don't surface in local Docker Compose. They only appear when you push to main and watch the workflow fail, fix, push, watch, fix, push. Eleven times.

The agent code was iterative — a broken prompt still returns something, a slow model is still a model. Deploy is binary. Every failed run costs 5-10 minutes of waiting, reading logs, pushing a fix, and waiting again. At 3 AM, those minutes compound in a way that feature bugs never do.

If you're entering a hackathon: deploy to production on day one. Not when the product is ready — when the repository exists.

The 2-hour deployment constraint

Why 4 services share one VM

Partial deploys on a 4-service stack

CI identity vs runtime identity

Adding vision to a blind scraper

The failures that mattered

Deploy first, debug later