Database
Platz keeps all of its state in a single PostgreSQL database — users, envs, clusters, deployments, tasks, secrets, k8s resource snapshots, helm chart metadata. Every backend worker connects to the same database; there is no separate cache, queue, or event store. PostgreSQL's LISTEN/NOTIFY is what powers the live UI updates.
This page covers what to use, how to size it, how Platz handles migrations, and how to back it up.
Supported versions
Platz is built and tested against PostgreSQL 17. Older versions back to 12 are likely to work — the schema uses uuid, jsonb, timestamptz, and LISTEN/NOTIFY, all of which have been available for years — but Platz CI doesn't exercise them.
If you're standing up a new database, pick the latest PostgreSQL the rest of your infrastructure supports. Managed offerings work fine:
- AWS RDS for PostgreSQL — production default. Use
db.t4g.mediumor larger for production workloads;db.t4g.microis fine for staging. - AWS Aurora Serverless v2 — works, but watch ACU floor settings. Platz's
LISTEN/NOTIFYtraffic keeps the connection warm, so a too-low floor can pin you above your ideal cost. - Google Cloud SQL for PostgreSQL — works.
- Self-hosted on a VM or in-cluster — fine for staging and dev. Production self-hosting is mostly a question of how comfortable your team is with operating Postgres.
The chart does not provision Postgres for you. Previous chart versions bundled a Bitnami subchart; that dependency was removed in v0.6.x to give operators full control over the database. The pre-0.6.3 chart versions that depended on it have since been removed from the Helm index because the Bitnami image they pulled is no longer published.
Sizing
For most installs, Platz's database workload is modest:
- A few writes per second during a wave of deployments.
- A few hundred event broadcasts per second when many clusters and resources change at once.
- A long tail of read queries from the API.
Sensible defaults to start with:
- CPU: 2 vCPUs.
- Memory: 4 GiB.
- Storage: 50 GiB GP3 (or equivalent), with autoscaling if your cloud supports it.
- Connections: 100 max connections is plenty. The chart's default replica counts open ~20 connections in total (each worker keeps a small pool). If you scale workers, watch
max_connectionsand Postgres' shared-buffer sizing.
Scale up if you see the API's response time creep above 100ms for list endpoints, or if LISTEN/NOTIFY events back up (you'll see this as stale data in the UI).
Required Postgres features
Platz needs:
uuid-ossporpgcryptois NOT required — UUIDs are generated by the application, not the database.LISTEN/NOTIFY— used by the API to broadcast change events to the WebSocket. This is built into Postgres, no extension needed. Some managed Postgres offerings disableLISTEN/NOTIFYover PgBouncer in transaction-pool mode — make sure you connect directly or in session mode.jsonb— used for env node selectors, tolerations, deployment configs, reported status, task operations. Built-in since 9.4.- Standard SQL features —
WITH/CTE, window functions, etc.
If you're putting PgBouncer in front, use session pooling. Transaction pooling breaks LISTEN/NOTIFY and the WebSocket goes silent.
Migrations
Platz uses diesel for schema management. Every commit to platzio/backend that changes the schema ships a migration in db/migrations/. The API pod runs all pending migrations at startup before opening the HTTP listener — no separate migration job, no manual diesel migration run.
The implication for upgrades:
- The API pod must be able to acquire an exclusive lock on
__diesel_schema_migrationsduring startup. If two API pods start at once, one will wait. (The default replica count is 1, so this is rarely an issue.) - A failed migration crashes the API. The pod will
CrashLoopBackOffuntil the migration succeeds. Check pod logs for the SQL error. - Rolling forward only. If you need to roll back, restore from a backup or run
diesel migration revertagainst the database manually — there's no in-process down-migration support.
For paranoia-grade upgrades, take a manual database snapshot before bumping the Platz Helm release. The chart's automated backup CronJob (see below) also helps, but a fresh snapshot before a known schema change is a good habit.
Connecting to the database
Each Platz pod reads its connection settings from the secret named by database.secretName (default postgres-creds). The five keys (PGHOST, PGPORT, PGUSER, PGPASSWORD, PGDATABASE) are injected as environment variables and Diesel picks them up via libpq.
There's no DATABASE_URL knob; the chart doesn't construct a connection string. If you need TLS or special connection parameters:
- For sslmode: set
PGSSLMODE=require(orverify-full) inapi.extraEnv,k8sAgent.extraEnv, etc. - For a CA bundle: mount it via
extraVolumes/extraVolumeMounts(also requires customizing the deployment templates — open an issue if you need this).
Connection pool tuning
Each Platz pod opens its own pool against Postgres. Defaults match what most installs want — 50 max connections per pod, 30 second connection timeout, 600 second idle timeout, 1800 second max lifetime. Override any of these from values.yaml:
database:
pool:
maxSize: "100" # ceiling for concurrent connections
minIdle: "5" # keep this many warm
connectionTimeoutSecs: "10" # fail fast if Postgres is unreachable
idleTimeoutSecs: "300" # close idle connections sooner
maxLifetimeSecs: "900" # recycle connections more aggressively
Each entry maps to a DB_POOL_* environment variable injected by the chart's database.envVars helper, so every pod (api, chart-discovery, k8s-agent, resource-sync, status-updates) sees the same settings. Leave fields blank to fall back to the built-in defaults.
Sizing notes:
- The chart's default replica count opens up to
maxSize × 5connections across all pods. The default 50 × 5 = 250 connections, comfortably under PostgreSQL's defaultmax_connections=100only if you've also raisedmax_connectionsserver-side. For small installs, dropmaxSizeto 10–20. - If you put PgBouncer in front (session mode — see above),
maxSizeis the pool size against PgBouncer, not against Postgres. The Postgres-side connection count is whatever PgBouncer'sdefault_pool_sizedictates. - Lowering
connectionTimeoutSecsis the right move during incidents — it surfaces "Postgres is gone" faster instead of letting requests stack up on the queue.
The Terraform module exposes the same knobs via the database_pool variable (object with max_size, min_idle, connection_timeout_secs, idle_timeout_secs, max_lifetime_secs).
Connecting from outside the cluster (for debugging)
When you need to poke at the database — running ad-hoc queries, debugging a stuck task, reading a deployment's config blob — you can port-forward from the pod that's hosting your Postgres, or use any Postgres client with the credentials from the secret.
# Read the secret values out
kubectl -n platzio get secret postgres-creds -o yaml | yq '.data | map_values(@base64d)'
# psql via port-forward (assuming Postgres is in-cluster)
kubectl -n my-db port-forward svc/postgres 15432:5432
PGHOST=127.0.0.1 PGPORT=15432 PGUSER=platz PGPASSWORD=... PGDATABASE=platz psql
For an RDS database, you'll need the bastion / VPN setup your network team uses.
⚠️ Be very careful running write queries against the production database. Platz reflects changes back to clients via LISTEN/NOTIFY; a manual UPDATE will trigger UI updates and may surprise users. Stick to read-only queries unless you're recovering from a known-broken state.
Backups
The chart includes an optional backup CronJob that streams pg_dump output to an encrypted S3 object once an hour:
backupJob:
enabled: true
config:
bucketName: my-platz-backups
bucketRegion: us-east-1
bucketPrefix: prod/
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/platz-backup
You'll also need a Kubernetes secret named backup-config with key encryptionKey containing a random 32-byte secret used to symmetrically encrypt the dump before upload.
Caveats:
- The schedule is hard-coded to hourly (
0 * * * *). If you want something else, override the CronJob via your own manifest or fork the chart. - The backup is a full
pg_dump, not a base backup. Restore ispg_restore, which means you'll lose any post-snapshot activity — RPO is one hour at best. - For lower RPOs, layer your own provider's point-in-time recovery on top (RDS automated backups, Cloud SQL automated backups). The CronJob backup is a belt-and-suspenders option, not a replacement for managed PITR.
To restore: decrypt the S3 object, then pg_restore into a fresh database. Update postgres-creds to point at the new database, redeploy Platz, and the API pod's migration run will bring the schema to the current version (which should already match the dump).
Database growth and retention
The tables that grow over time:
deployment_tasks— one row per operation per deployment. Hundreds of bytes plus a JSONB blob for the operation payload. Even busy installs accumulate this slowly.k8s_resources— one row per tracked Kubernetes resource per cluster. Reflects the live cluster state, so it grows with your fleet size but doesn't grow forever.helm_charts— one row per chart version ever discovered. Mostly metadata; relatively small.
There is no built-in retention policy for tasks. If your audit trail is years old and the database has grown larger than you'd like, you can DELETE FROM deployment_tasks WHERE created_at < NOW() - INTERVAL '1 year' — the foreign keys will keep the deployments intact. Take a backup first.