Skip to content

Observability

This chapter describes the observability architecture of CobaltCore: Metrics, Logging, Tracing, and LibVirt telemetry.

Scope: Control Plane Cluster, Hypervisor Cluster, and Management Cluster. Storage Cluster telemetry (Prysm) is not covered in this chapter — see Prysm for Storage observability. For the overall four-cluster topology, see Architecture Overview.

Architecture Overview

text
┌──────────────────────────────────────────────────────────────────────────────────┐
│                           MANAGEMENT CLUSTER (Hub)                               │
│                                                                                  │
│  ┌────────────────────────────────────────────────────────────────────────────┐  │
│  │                          Greenhouse                                        │  │
│  │                                                                            │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐    │  │
│  │  │   Grafana    │  │ Alertmanager │  │   Loki /     │  │  Jaeger /    │    │  │
│  │  │  Dashboards  │  │              │  │  OpenSearch  │  │  Tempo       │    │  │
│  │  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘    │  │
│  │         │                 │                 │                 │            │  │
│  │         └────────┬────────┴──────────┬──────┴──────────┬──────┘            │  │
│  │                  │                   │                 │                   │  │
│  │           Prometheus          Log Aggregation    Trace Collection          │  │
│  │          (Federation)          (central)          (central)                │  │
│  └──────────┬───────────────────┬───────────────────┬─────────────────────────┘  │
│             │                   │                   │                            │
│  ┌──────────┴───────────────────┴───────────────────┴─────────────────────────┐  │
│  │                     Prometheus (local)                                     │  │
│  │                     Fluent Bit / Vector                                    │  │
│  │                     OTEL Collector                                         │  │
│  └──────────┬─────────────────────────────────────────────────────────────────┘  │
└─────────────┼────────────────────────────────────────────────────────────────────┘
              │ Federation / Shipping / Export

     ┌────────┴────────┬──────────────────┐
     │                 │                  │
     ▼                 ▼                  ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ CONTROL      │ │ HYPERVISOR   │ │ STORAGE      │
│ PLANE        │ │ CLUSTER      │ │ CLUSTER      │
│ CLUSTER      │ │              │ │              │
├──────────────┤ ├──────────────┤ │ (out of      │
│ Prometheus   │ │ Prometheus   │ │  scope,      │
│ Fluent Bit   │ │ Fluent Bit   │ │  see         │
│ OTEL Coll.   │ │ OTEL Coll.   │ │  Prysm)      │
│              │ │              │ │              │
│ Metrics:     │ │ Metrics:     │ └──────────────┘
│  Service Op. │ │  node-export │
│  MariaDB     │ │  libvirt-exp │
│  RabbitMQ    │ │  OVS Stats   │
│  Valkey      │ │              │
│              │ │ Logs:        │
│ Logs:        │ │  Agent Logs  │
│  OS Services │ │  LibVirt     │
│  Infra       │ │              │
│              │ │ Traces:      │
│ Traces:      │ │  Agent Spans │
│  API Spans   │ │              │
└──────────────┘ └──────────────┘

Signal-Cluster Matrix

SignalControl Plane ClusterHypervisor ClusterManagement Cluster
MetricsService Operators, MariaDB, RabbitMQ, Valkeynode-exporter, libvirt-exporter, OVS statisticsGreenhouse aggregation, federation
LogsOpenStack Services, Infrastructure logsAgent logs, LibVirt logsCentral log store
TracesAPI request traces (oslo.metrics, OTEL)Agent spansTrace backend (Jaeger / Tempo)

Principles

  • Per-Signal Architecture: Metrics, logs, and traces are treated as independent signals with dedicated pipelines per cluster.
  • Local Collection, Central Aggregation: Each cluster collects telemetry locally. The Management Cluster aggregates across all clusters.
  • Greenhouse as Hub: Greenhouse in the Management Cluster provides the central interface for dashboards, alerting, and correlation.

Subchapters

DocumentDescription
MetricsPrometheus, Federation, Greenhouse, Alerting
LoggingOpenStack logs, centralization, audit
TracingOpenTelemetry, distributed tracing
LibVirt TelemetryLibVirt metrics, logs, events