Observability
This chapter describes the observability architecture of CobaltCore: Metrics, Logging, Tracing, and LibVirt telemetry.
Scope: Control Plane Cluster, Hypervisor Cluster, and Management Cluster. Storage Cluster telemetry (Prysm) is not covered in this chapter — see Prysm for Storage observability. For the overall four-cluster topology, see Architecture Overview.
Architecture Overview
text
┌──────────────────────────────────────────────────────────────────────────────────┐
│ MANAGEMENT CLUSTER (Hub) │
│ │
│ ┌────────────────────────────────────────────────────────────────────────────┐ │
│ │ Greenhouse │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Grafana │ │ Alertmanager │ │ Loki / │ │ Jaeger / │ │ │
│ │ │ Dashboards │ │ │ │ OpenSearch │ │ Tempo │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │ │
│ │ └────────┬────────┴──────────┬──────┴──────────┬──────┘ │ │
│ │ │ │ │ │ │
│ │ Prometheus Log Aggregation Trace Collection │ │
│ │ (Federation) (central) (central) │ │
│ └──────────┬───────────────────┬───────────────────┬─────────────────────────┘ │
│ │ │ │ │
│ ┌──────────┴───────────────────┴───────────────────┴─────────────────────────┐ │
│ │ Prometheus (local) │ │
│ │ Fluent Bit / Vector │ │
│ │ OTEL Collector │ │
│ └──────────┬─────────────────────────────────────────────────────────────────┘ │
└─────────────┼────────────────────────────────────────────────────────────────────┘
│ Federation / Shipping / Export
│
┌────────┴────────┬──────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ CONTROL │ │ HYPERVISOR │ │ STORAGE │
│ PLANE │ │ CLUSTER │ │ CLUSTER │
│ CLUSTER │ │ │ │ │
├──────────────┤ ├──────────────┤ │ (out of │
│ Prometheus │ │ Prometheus │ │ scope, │
│ Fluent Bit │ │ Fluent Bit │ │ see │
│ OTEL Coll. │ │ OTEL Coll. │ │ Prysm) │
│ │ │ │ │ │
│ Metrics: │ │ Metrics: │ └──────────────┘
│ Service Op. │ │ node-export │
│ MariaDB │ │ libvirt-exp │
│ RabbitMQ │ │ OVS Stats │
│ Valkey │ │ │
│ │ │ Logs: │
│ Logs: │ │ Agent Logs │
│ OS Services │ │ LibVirt │
│ Infra │ │ │
│ │ │ Traces: │
│ Traces: │ │ Agent Spans │
│ API Spans │ │ │
└──────────────┘ └──────────────┘Signal-Cluster Matrix
| Signal | Control Plane Cluster | Hypervisor Cluster | Management Cluster |
|---|---|---|---|
| Metrics | Service Operators, MariaDB, RabbitMQ, Valkey | node-exporter, libvirt-exporter, OVS statistics | Greenhouse aggregation, federation |
| Logs | OpenStack Services, Infrastructure logs | Agent logs, LibVirt logs | Central log store |
| Traces | API request traces (oslo.metrics, OTEL) | Agent spans | Trace backend (Jaeger / Tempo) |
Principles
- Per-Signal Architecture: Metrics, logs, and traces are treated as independent signals with dedicated pipelines per cluster.
- Local Collection, Central Aggregation: Each cluster collects telemetry locally. The Management Cluster aggregates across all clusters.
- Greenhouse as Hub: Greenhouse in the Management Cluster provides the central interface for dashboards, alerting, and correlation.
Subchapters
| Document | Description |
|---|---|
| Metrics | Prometheus, Federation, Greenhouse, Alerting |
| Logging | OpenStack logs, centralization, audit |
| Tracing | OpenTelemetry, distributed tracing |
| LibVirt Telemetry | LibVirt metrics, logs, events |