Skip to content

High Availability

HA Architecture (Hypervisor Operator + HA Agent)

Note: The HA functionality is realized through the interaction of the OpenStack Hypervisor Operator (runs as Deployment in the Hypervisor Cluster) and the HA Agent (DaemonSet on each node). The Hypervisor HA Service already exists but is not yet publicly available. This will change in the future.

text
                    ┌──────────────────────────────┐
                    │   Hypervisor Operator        │
                    │   (Hypervisor Cluster)       │
                    │                              │
                    │  - Watched: K8s Nodes        │
                    │  - Manages: Hypervisor CRDs  │
                    │  - Handles: Eviction CRDs    │
                    │  - Handles: Migration CRDs   │
                    └──────────────┬───────────────┘

                            K8s API (CRDs)

           ┌───────────────────────┼───────────────────────┐
           │                       │                       │
           ▼                       ▼                       ▼
┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   HA Agent       │    │   HA Agent       │    │   HA Agent       │
│   (Node 1)       │    │   (Node 2)       │    │   (Node N)       │
│   (DaemonSet)    │    │   (DaemonSet)    │    │   (DaemonSet)    │
│                  │    │                  │    │                  │
│  LibVirt Events: │    │  LibVirt Events: │    │  LibVirt Events: │
│  - Lifecycle     │    │  - Lifecycle     │    │  - Lifecycle     │
│  - Reboots       │    │  - Reboots       │    │  - Reboots       │
│  - Watchdog      │    │  - Watchdog      │    │  - Watchdog      │
│  - I/O Errors    │    │  - I/O Errors    │    │  - I/O Errors    │
│                  │    │                  │    │                  │
│  Creates/Updates │    │  Creates/Updates │    │  Creates/Updates │
│  Eviction CRDs   │    │  Eviction CRDs   │    │  Eviction CRDs   │
└──────────────────┘    └──────────────────┘    └──────────────────┘

For the CRD definitions (Hypervisor, Eviction, Migration), see CRDs. For the agent architecture, see Component Interaction.

Control Plane HA

All infrastructure services in the Control Plane Cluster are deployed redundantly. The failure of individual instances is automatically compensated.

text
┌─────────────────────────────────────────────────────────────────────────┐
│                    CONTROL PLANE HA STACK                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │  MariaDB Galera (3 Nodes)                                       │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐                    │    │
│  │  │  Node 1   │◀─▶  Node 2   │◀─▶  Node 3   │  Synchronous Multi-│    │
│  │  │  (R/W)    │  │  (R/W)    │  │  (R/W)    │  Master Replication│    │
│  │  └───────────┘  └───────────┘  └───────────┘                    │    │
│  │           ▲            ▲            ▲                           │    │
│  │           └────────────┼────────────┘                           │    │
│  │                   MaxScale Proxy                                │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │  RabbitMQ Cluster (3 Nodes)                                     │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐                    │    │
│  │  │  Node 1   │◀─▶  Node 2   │◀─▶  Node 3   │  Quorum Queues,    │    │
│  │  │           │  │           │  │           │  pause_minority    │    │
│  │  └───────────┘  └───────────┘  └───────────┘                    │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │  Valkey Sentinel (3 Nodes with Sentinel Sidecars)               │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐                    │    │
│  │  │  Primary  │──▶ Replica 1 │  │ Replica 2 │  Automatic         │    │
│  │  │+ Sentinel │  │+ Sentinel │  │+ Sentinel │  Failover via      │    │
│  │  └───────────┘  └───────────┘  └───────────┘  Sentinel Quorum   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │  OVN NB/SB (3 Replicas each, Raft Consensus)                    │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐                    │    │
│  │  │  Leader   │◀─▶ Follower  │◀─▶ Follower  │  Automatic         │    │
│  │  │  (R/W)    │  │  (R/O)    │  │  (R/O)    │  Leader Election   │    │
│  │  └───────────┘  └───────────┘  └───────────┘                    │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │  Memcached (memcached-operator, Deployment + Headless Service)  │    │
│  │  ┌───────────┐  ┌───────────┐                                   │    │
│  │  │ Instance 1│  │ Instance 2│  DNS-based Discovery,             │    │
│  │  └───────────┘  └───────────┘  Token Caching for Keystone       │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Components in Detail:

ComponentReplicasConsensus MechanismSpecial Features
MariaDB Galera3Synchronous multi-master replicationAutomatic rejoin after partition healing, MaxScale Proxy for read/write splitting
RabbitMQ3pause_minority partition handlingQuorum queues for guaranteed message delivery, automatic cluster recovery
Valkey Sentinel3Sentinel Quorum (Majority)3 nodes with Sentinel sidecars, automatic failover of primary node
OVN NB/SB3 eachRaft ConsensusAutomatic leader election, deployed via ovn-operator
Memcached2+No consensus (stateless)memcached-operator, anti-affinity + PDB, DNS-based discovery

For the OVN Raft architecture, see Network Architecture.

Data Plane HA

Data Plane HA ensures the availability of virtual machines on hypervisor nodes. For the complete hypervisor state machine and eviction process, see Hypervisor Lifecycle.

LibVirt Event Subscription:

The HA Agent subscribes to the local LibVirt daemon and reacts to the following event types:

Event TypeDescriptionHA Agent Reaction
Lifecycle EventVM state change (Start, Stop, Crash, Suspend)On unexpected stop/crash: Creates Eviction CRD for automatic recovery
Reboot EventVM restart detectedVerifies successful restart, on failure: Eviction CRD
Watchdog EventGuest watchdog triggered (e.g., QEMU i6300esb)VM restart on current host, on failure: migration to alternative host
I/O Error EventDisk or network I/O error of the VMNotification via Eviction CRD, on persistent error: migration

Component Interaction During Automatic Recovery:

text
┌──────────┐    ┌──────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────┐
│ LibVirt  │───▶│ HA Agent │───▶│ Eviction CRD │───▶│ Eviction     │───▶│ Nova API │
│ Event    │    │          │    │ (K8s API)    │    │ Controller   │    │          │
└──────────┘    └──────────┘    └──────────────┘    └──────────────┘    └──────────┘


                                                                    ┌──────────────┐
                                                                    │ Live         │
                                                                    │ Migration    │
                                                                    └──────────────┘
  1. LibVirt detects an event (e.g., VM crash, watchdog trigger)
  2. HA Agent receives the event via LibVirt event subscription
  3. HA Agent creates/updates an Eviction CRD in the Hypervisor Cluster
  4. Eviction Controller (part of the Hypervisor Operator) detects the Eviction CRD and performs preflight checks
  5. Eviction Controller calls the Nova API to initiate a live migration
  6. Nova orchestrates the migration to a suitable target hypervisor

Kubernetes Node Conditions:

The Hypervisor Operator watches the Kubernetes Node objects in the Hypervisor Cluster and reacts to condition changes:

Node ConditionHypervisor Operator Reaction
Ready=FalseMark node as NotReady, after timeout: automatic eviction of all VMs
Ready=UnknownMark node as Unreachable, after timeout: automatic eviction of all VMs
DiskPressure=TrueWarning, do not schedule new VMs
MemoryPressure=TrueWarning, do not schedule new VMs

ovn-controller Graceful Degradation:

In case of a disconnect from the OVN Southbound DB, the local ovn-controller on each hypervisor node continues to work in cached mode (for OVN architecture details, see Network Architecture):

  • Existing OpenFlow rules in ovs-vswitchd remain active
  • Running VM traffic is not interrupted
  • New network configuration changes are only applied after reconnect
  • Security group updates are cached and applied after reconnect

Failure Scenario Matrix

ComponentFailure BehaviorAuto-RecoveryManual Intervention
Single Hypervisor NodeVMs on this node unreachableHA Agent detects failure, Eviction CRDs are created, automatic live migration to available nodesOnly if no target capacity available or migration fails
MariaDB Node (1 of 3)Galera cluster continues with 2 nodes, no data lossAutomatic rejoin after node recovery, IST/SST synchronizationOnly with simultaneous failure of 2+ nodes
RabbitMQ Node (1 of 3)Quorum queues remain available, messages processed on remaining nodesAutomatic cluster rejoin, queue synchronizationOnly with pause_minority split and simultaneous multi-node failure
Valkey SentinelSentinel quorum elects new primary within secondsAutomatic failover, replica promoted to primaryOnly with simultaneous failure of primary + majority of sentinels
OVN NB/SB Raft LeaderRaft cluster automatically elects new leader, brief interruption (<5s)Automatic leader election, follower takes overOnly with simultaneous failure of 2+ Raft nodes
c5c3-operator PodNo new provisioning/orchestration, existing workloads unaffectedKubernetes Deployment automatically restarts podOnly with persistent error (CrashLoopBackOff)
Complete Control Plane ClusterNo API operations (VM Create/Delete/Migrate), running VMs continue to workVMs run unchanged, network remains (ovn-controller cached flows), no automatic control plane recoveryControl plane cluster must be restored, VMs are not manageable during outage
Network Partition Between ClustersHypervisor nodes lose connection to control plane, VMs continue to runovn-controller works in cached mode, VMs remain reachable, RabbitMQ enters pause_minorityResolve network partition, then automatic reconnect of all components

For HA behavior during OpenStack upgrades, see Upgrades.