Prometheus

What Really Is a Metric?

February 2, 2026 • Cardinality Cloud • 4 min read

A metric is not reality. It’s a lossy measurement with assumptions baked in.

I said it. I’ll own it. But when we talk about observability, what really is a metric?

The Four Golden Signals: What to Monitor

November 19, 2025 • Cardinality Cloud • 4 min read

The observability vendors charge by the byte. They want you to send everything. The industry tells you to measure everything. So you instrument everything, send it all to your vendor, and wait for clarity.

Instead, you get an Observability bill that’s higher than your AWS or GCP compute costs. And you still can’t answer basic questions: Is my application healthy? Are customers experiencing problems right now? Should I be paging someone?

Even with a top-tier vendor and unlimited budget, more data doesn’t equal more clarity. You’re drowning in metrics, dashboards, and alerts — but you still don’t know what actually matters.

prometheus monitoring golden-signals slo observability

PromQL Cheat Sheet - Complete Reference Guide

November 7, 2025 • Cardinality Cloud • 9 min read

Quick reference for Prometheus Query Language (PromQL) with practical examples for monitoring, alerting, and SLO calculations. Covers essential functions, aggregations, and common patterns for effective Observability.

prometheus promql reference monitoring

Runbook Template

October 30, 2025 • Cardinality Cloud • 5 min read

Every alert should have a Runbook. (Sometimes called Playbook.) A Runbook is a guide for SREs, DevOps, On-Call engineers, and Software Developers that prescribes potential remediations for specific alerts. The goal is to reduce MTTR and improve incident response with structured troubleshooting, verification steps, and escalation paths for SRE and DevOps teams. A place to build and share knowledge about a potential event.

runbook alerting prometheus operations sre incident-response troubleshooting on-call mttr remediation monitoring devops pagerduty

How does this tool efficiently calculate error budget over long SLO windows?

October 16, 2025 • Cardinality Cloud • 2 min read

Calculating error budget over 30 days should be simple, but naive Prometheus queries time out on high-cardinality metrics. This tool uses a Riemann Sum-inspired technique that pre-computes error ratios at 5-minute intervals, turning an expensive range query into a single fast aggregation. The result: accurate error budget calculations that scale.

faq technical prometheus performance

Understanding SLO-Based Alerting

October 15, 2025 • Cardinality Cloud • 2 min read

Why does a 5% error rate trigger an alert at 2 AM? Is it catastrophic during peak traffic or meaningless during low usage? Traditional static thresholds can’t tell you. SLO-based alerting asks a better question: “Are we consuming our error budget faster than planned?” This approach ties alerts directly to user-impacting reliability issues, eliminating arbitrary thresholds and reducing alert fatigue while catching real problems early.

slo alerting prometheus

How do I query these generated rules in Prometheus to monitor my application?

October 14, 2025 • Cardinality Cloud • 2 min read

You’ve deployed the generated SLO rules to Prometheus - now what? The recording rules are pre-computing your SLO metrics every minute, but how do you actually check if you’re meeting your targets, monitor error budget consumption, or build dashboards? This guide shows you the essential PromQL queries to unlock the full power of your SLO monitoring, from checking current status to visualizing long-term trends.

faq prometheus promql monitoring

How do I size my Prometheus deployment?

October 12, 2025 • Cardinality Cloud • 3 min read

Your monitoring just went down because Prometheus got OOM-killed again. Or maybe you’re paying for 32GB of RAM when 8GB would suffice. Sizing Prometheus shouldn’t be guesswork - it’s actually predictable math based on three inputs: active time series, scrape interval, and retention period. Our Resource Calculator does the math for you, showing memory, CPU, and disk requirements with visual guidance on safe ranges and real-world scaling examples.

faq prometheus capacity-planning resources