Keystone Operator
The Keystone operator deploys and manages the OpenStack Identity Service as a Kubernetes-native workload. It is the reference implementation for all CobaltCore service operators — the patterns established here (CRD layout, sub-reconciler chain, webhooks, finalizers, instrumentation) will be replicated for Nova, Neutron, Glance, and other OpenStack service operators.
This page is a feature catalogue and entry point. Each item links to the in-depth reference doc for that area.
Lifecycle and Reconciliation
- Sub-reconciler chain. A focused pipeline of sub-reconcilers — Secrets → Config → FernetKeys / CredentialKeys / NetworkPolicy → Database → PolicyValidation → Deployment → HTTPRoute → HealthCheck → HPA → Bootstrap → TrustFlush — each emitting a typed sub-condition that aggregates into
Ready. See Reconciler Architecture. - Parallel execution group. FernetKeys, CredentialKeys and NetworkPolicy run concurrently via
errgroupto cut tail latency on cold reconciles. - Two finalizers. The standard cleanup finalizer cascades owned resources; the OpenBao finalizer gates deletion on ESO
PushSecretcleanup so Fernet/credential key backups in OpenBao stay consistent. - Watch-driven reactivity. Field-indexed
Secretwatches and aPushSecretname-match mapper with predicate filter wake the workqueue only on transitions the state machine branches on, not on every ESO sync tick.
CRD Surface
- Comprehensive spec. Replicas, image, database, cache, fernet, credentialKeys, trustFlush, bootstrap, federation, middleware, plugins, policy overrides, autoscaling, networkPolicy, gateway, resources, uwsgi, graceful-termination knobs, topologySpreadConstraints, priorityClassName, rollout
strategy, and free-formextraConfig. - Status with sub-conditions. Eleven typed sub-conditions plus
installedRelease,targetRelease,upgradePhase, andendpoint— surfaced viakubectl get keystonesprinter columns. - Validating + Defaulting webhooks. CEL validation rules enforced by the API server (database/cache exclusivity, autoscaling targets, replica/key minimums, graceful-termination invariants) plus defaults injected by the webhook for replicas, resources, and graceful-termination knobs.
- Stable sub-resource naming. All emitted resources are named after the CR with no
-apisuffix; cluster-internal DNS aligns with the public Gateway hostname.
See CRD API Reference and Controller Events.
Encryption Key Management
- Fernet token keys. Per-CR CronJob with configurable schedule and
maxActiveKeys; rotation script delivered via ConfigMap. - Credential keys. Same rotation model, but each rotation is automatically followed by a
credential_migratestep. - Automatic rolling restart on rotation. The pod template carries
keystone.c5c3.io/fernet-keys-hashandcredential-keys-hashannotations, so any key change triggers a Deployment rollout. - OpenBao backup via ESO PushSecret. Keys are mirrored to OpenBao for disaster recovery; staging Secrets are owner-referenced for cache eviction on rotation.
- Watch-driven backup finalizer. PushSecret watch with predicate filter eliminates per-sync workqueue churn and trims delete latency to sub-15s.
See the Key Rotation Guide.
Database Lifecycle
- Managed mode via mariadb-operator. Operator emits
Database/User/GrantCRs and waits for the upstream MariaDB cluster to report health before runningdb_sync. - Schema drift detection. A read-only schema-check Job runs after
db_syncand fails the reconcile if the database schema deviates from the expected Alembic head. See Schema Drift Detection. - Expand-migrate-contract upgrades. When
spec.image.tagadvances to a new OpenStack release, the operator drives phased database migrations while keeping the API available. Sequential-only upgrade paths; patch revisions skip migration entirely. See Upgrade Flow. - oslo.config env-var overrides. Database credentials and other runtime knobs are injected via
OS_<GROUP>__<OPTION>env vars rather than baked into the rendered config, so credential rotation does not require a ConfigMap re-render.
Networking and Exposure
- Cluster-internal Service. ClusterIP, named port, stable DNS at
<name>.<namespace>.svc.cluster.local:5000. - Gateway API integration. Optional
HTTPRouterendered fromspec.gateway; presence ofgateway.networking.k8s.io/v1is detected at startup via the manager'sRESTMapperand the watch is registered only when the CRD is installed.status.endpointreflects the Gateway hostname. - Per-CR NetworkPolicy. Auto-derived egress to database, cache, ESO and OpenBao; configurable ingress.
- Operator NetworkPolicy. Chart-level, default-off, opt-in hardening of the operator pod itself with fail-closed render guards. See Operator NetworkPolicy and the enablement guide.
Observability
- Active HTTP health check against the Keystone API endpoint drives the
KeystoneAPIReadycondition. Injectable HTTP client for tests. - Kubernetes Events for every state transition — bootstrap, db_sync, upgrade phases, key generation, deployment rollout. Catalogued in Controller Events.
- Prometheus metrics + ServiceMonitor. Reconcile duration, per-condition error counts, key rotation age, db_sync outcomes and duration. Contract-tested against this catalogue. See Operator Metrics and the enablement guide.
Day-2 Operations
- Bootstrap Job. Idempotent
keystone-manage bootstrapestablishing the admin project/user/role, region and public endpoint. - Trust flush CronJob. Optional periodic cleanup of expired trust delegations.
- Policy validation.
oslopolicy-validatorJob blocks rollouts on invalid policy overrides. - Graceful-termination knobs.
terminationGracePeriodSeconds,preStopSleepSeconds, and rolloutstrategyexposed on the CR with webhook-enforced invariants. - HPA lifecycle. HPA is created when
spec.autoscalingis set, removed when cleared, with CPU and/or memory targets. - Topology spread + PriorityClass. Sensible defaults across zone and hostname; webhook validates that referenced PriorityClasses exist.
- ConfigMap rotation pruning. Stale
<name>-config-<hash>ConfigMaps are pruned after rollout, retaining the three most recent revisions for fast rollback.
Where to go next
- New to the operator? Start with the Quick Start.
- Running it in production? Read Day 2 Operations, Observability & Diagnostics, and Multi-Tenant Deployment.
- Diving into the code? Begin with Reconciler Architecture and follow the links into the individual sub-reconcilers.