Database

Platz keeps all of its state in a single PostgreSQL database — users, envs, clusters, deployments, tasks, secrets, k8s resource snapshots, helm chart metadata. Every backend worker connects to the same database; there is no separate cache, queue, or event store. PostgreSQL's LISTEN/NOTIFY is what powers the live UI updates.

This page covers what to use, how to size it, how Platz handles migrations, and how to back it up.

Supported versions

Platz is built and tested against PostgreSQL 17. Older versions back to 12 are likely to work — the schema uses uuid, jsonb, timestamptz, and LISTEN/NOTIFY, all of which have been available for years — but Platz CI doesn't exercise them.

If you're standing up a new database, pick the latest PostgreSQL the rest of your infrastructure supports. Managed offerings work fine:

AWS RDS for PostgreSQL — production default. Use db.t4g.medium or larger for production workloads; db.t4g.micro is fine for staging.
AWS Aurora Serverless v2 — works, but watch ACU floor settings. Platz's LISTEN/NOTIFY traffic keeps the connection warm, so a too-low floor can pin you above your ideal cost.
Google Cloud SQL for PostgreSQL — works.
Self-hosted on a VM or in-cluster — fine for staging and dev. Production self-hosting is mostly a question of how comfortable your team is with operating Postgres.

The chart does not provision Postgres for you. Previous chart versions bundled a Bitnami subchart; that dependency was removed in v0.6.x to give operators full control over the database. The pre-0.6.3 chart versions that depended on it have since been removed from the Helm index because the Bitnami image they pulled is no longer published.

Sizing

For most installs, Platz's database workload is modest:

A few writes per second during a wave of deployments.
A few hundred event broadcasts per second when many clusters and resources change at once.
A long tail of read queries from the API.

Sensible defaults to start with:

CPU: 2 vCPUs.
Memory: 4 GiB.
Storage: 50 GiB GP3 (or equivalent), with autoscaling if your cloud supports it.
Connections: 100 max connections is plenty for typical load — connections are opened on demand, and an idle install holds a few dozen at most. Note that the default per-pod pool ceiling across all pods adds up to more than 100, so cap database.pool.maxSize or raise max_connections accordingly — see Connection pool tuning below.

Scale up if you see the API's response time creep above 100ms for list endpoints, or if LISTEN/NOTIFY events back up (you'll see this as stale data in the UI).

Required Postgres features

Platz needs:

uuid-ossp or pgcrypto is NOT required — UUIDs are generated by the application, not the database.
LISTEN / NOTIFY — used by the API to broadcast change events to the WebSocket. This is built into Postgres, no extension needed. Some managed Postgres offerings disable LISTEN/NOTIFY over PgBouncer in transaction-pool mode — make sure you connect directly or in session mode.
jsonb — used for env node selectors, tolerations, deployment configs, reported status, task operations. Built-in since 9.4.
Standard SQL features — WITH/CTE, window functions, etc.

If you're putting PgBouncer in front, use session pooling. Transaction pooling breaks LISTEN/NOTIFY and the WebSocket goes silent.

Migrations

Platz uses diesel for schema management. Every commit to platzio/backend that changes the schema ships a migration in db/migrations/. The API pod runs all pending migrations at startup before opening the HTTP listener — no separate migration job, no manual diesel migration run.

The implication for upgrades:

The API pod must be able to acquire an exclusive lock on __diesel_schema_migrations during startup. If two API pods start at once, one will wait. (The default replica count is 1, so this is rarely an issue.)
A failed migration crashes the API. The pod will CrashLoopBackOff until the migration succeeds. Check pod logs for the SQL error.
Rolling forward only. If you need to roll back, restore from a backup or run diesel migration revert against the database manually — there's no in-process down-migration support.

For paranoia-grade upgrades, take a manual database snapshot before bumping the Platz Helm release. The chart's automated backup CronJob (see below) also helps, but a fresh snapshot before a known schema change is a good habit.

Connecting to the database

Each Platz pod reads its connection settings from the secret named by database.secretName (default postgres-creds). The five keys (PGHOST, PGPORT, PGUSER, PGPASSWORD, PGDATABASE) are injected as environment variables, and the backend assembles them into a postgres:// connection URL itself. Platz does not link against libpq — it connects through tokio-postgres — so libpq-only environment variables (PGSSLMODE, PGSSLROOTCERT, a ~/.pgpass file, etc.) have no effect.

The backend also honors a DATABASE_URL environment variable, which takes precedence over the five PG* variables. It's a deprecated legacy option and the chart never sets it; if you need it, inject it via api.extraEnv, k8sAgent.extraEnv, etc.

⚠️ TLS to Postgres is not currently supported. The backend's database connections (including the dedicated LISTEN/NOTIFY connection) are established without a TLS connector, so they are plaintext. Keep the network path between the Platz pods and Postgres private — same-VPC private subnets with tight security groups, or an encrypted overlay/service mesh if your environment requires encryption in transit. If your managed Postgres enforces rejectUnencrypted/ssl=on for all clients, Platz won't be able to connect; open an issue if you need native TLS support.

Connection pool tuning

Each Platz pod opens its own pool against Postgres. Defaults match what most installs want — 50 max connections per pod, 30 second connection timeout, 600 second idle timeout, 1800 second max lifetime. Override any of these from values.yaml:

database:
  pool:
    maxSize: "100" # ceiling for concurrent connections
    minIdle: "5" # keep this many warm
    connectionTimeoutSecs: "10" # fail fast if Postgres is unreachable
    idleTimeoutSecs: "300" # close idle connections sooner
    maxLifetimeSecs: "900" # recycle connections more aggressively

Each entry maps to a DB_POOL_* environment variable injected by the chart's database.envVars helper, so every pod (api, chart-discovery, k8s-agent, resource-sync, status-updates) sees the same settings. Leave fields blank to fall back to the built-in defaults.

Sizing notes:

The chart's default replica count allows up to maxSize × 5 connections across all pods (api, chart-discovery, k8s-agent, resource-sync, status-updates). The default ceiling of 50 × 5 = 250 exceeds PostgreSQL's default max_connections=100. Connections are opened on demand (no minimum is kept warm unless you set minIdle), so a quiet install sits far below the ceiling — but under load you can hit it. Either raise max_connections server-side or drop maxSize to 10–20 for small installs.
If you put PgBouncer in front (session mode — see above), maxSize is the pool size against PgBouncer, not against Postgres. The Postgres-side connection count is whatever PgBouncer's default_pool_size dictates.
Lowering connectionTimeoutSecs is the right move during incidents — it surfaces "Postgres is gone" faster instead of letting requests stack up on the queue.

The Terraform module exposes the same knobs via the database_pool variable (object with max_size, min_idle, connection_timeout_secs, idle_timeout_secs, max_lifetime_secs).

Connecting from outside the cluster (for debugging)

When you need to poke at the database — running ad-hoc queries, debugging a stuck task, reading a deployment's config blob — you can port-forward from the pod that's hosting your Postgres, or use any Postgres client with the credentials from the secret.

# Read the secret values out
kubectl -n platzio get secret postgres-creds -o yaml | yq '.data | map_values(@base64d)'

# psql via port-forward (assuming Postgres is in-cluster)
kubectl -n my-db port-forward svc/postgres 15432:5432
PGHOST=127.0.0.1 PGPORT=15432 PGUSER=platz PGPASSWORD=... PGDATABASE=platz psql

For an RDS database, you'll need the bastion / VPN setup your network team uses.

⚠️ Be very careful running write queries against the production database. Platz reflects changes back to clients via LISTEN/NOTIFY; a manual UPDATE will trigger UI updates and may surprise users. Stick to read-only queries unless you're recovering from a known-broken state.

Backups

The chart includes an optional backup CronJob that streams pg_dump output to an encrypted S3 object once an hour:

backupJob:
  enabled: true
  config:
    bucketName: my-platz-backups
    bucketRegion: us-east-1
    bucketPrefix: prod/
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/platz-backup

You'll also need a Kubernetes secret named backup-config with key encryptionKey containing a random 32-byte secret used to symmetrically encrypt the dump before upload.

Caveats:

The schedule is hard-coded to hourly (0 * * * *). If you want something else, override the CronJob via your own manifest or fork the chart.
The backup is a full pg_dump, not a base backup. Restore is pg_restore, which means you'll lose any post-snapshot activity — RPO is one hour at best.
For lower RPOs, layer your own provider's point-in-time recovery on top (RDS automated backups, Cloud SQL automated backups). The CronJob backup is a belt-and-suspenders option, not a replacement for managed PITR.

To restore: decrypt the S3 object, then pg_restore into a fresh database. Update postgres-creds to point at the new database, redeploy Platz, and the API pod's migration run will bring the schema to the current version (which should already match the dump).

Database growth and retention

The tables that grow over time:

deployment_tasks — one row per operation per deployment. Hundreds of bytes plus a JSONB blob for the operation payload. Even busy installs accumulate this slowly.
k8s_resources — one row per tracked Kubernetes resource per cluster. Reflects the live cluster state, so it grows with your fleet size but doesn't grow forever.
helm_charts — one row per chart version ever discovered. Mostly metadata; relatively small.

There is no built-in retention policy for tasks. If your audit trail is years old and the database has grown larger than you'd like, you can DELETE FROM deployment_tasks WHERE created_at < NOW() - INTERVAL '1 year' — the foreign keys will keep the deployments intact. Take a backup first.

Supported versions​

Sizing​

Required Postgres features​

Migrations​

Connecting to the database​

Connection pool tuning​

Connecting from outside the cluster (for debugging)​

Backups​

Database growth and retention​