Faq | Prometheus Alert Rule Generator & SLO Calculator

What is an SLO and why should I use SLO-based alerts?

October 20, 2025 • Cardinality Cloud • 9 min read

Traditional infrastructure alerts page you when CPU hits 80%, but your users are fine. Meanwhile, degraded API performance goes unnoticed because no arbitrary threshold was crossed. An SLO (Service Level Objective) changes this - it’s a target reliability goal that measures what users actually experience, like “99.9% of requests succeed over 30 days.” Born from Google’s Site Reliability Engineering (SRE) practices, SLO-based alerting only pages when user experience is genuinely at risk, eliminating alert fatigue while catching real issues early.

Why is burn rate alerting useful?

October 18, 2025 • Cardinality Cloud • 2 min read

Traditional threshold alerts fire on every spike, creating alert fatigue. Burn rate alerting is different - it tracks how quickly you’re consuming your error budget and only alerts when errors are sustained enough to threaten your reliability target. This gives you early warnings before user experience degrades, while dramatically reducing noise.

faq slo burn-rate alerting

How does this tool efficiently calculate error budget over long SLO windows?

October 16, 2025 • Cardinality Cloud • 2 min read

Calculating error budget over 30 days should be simple, but naive Prometheus queries time out on high-cardinality metrics. This tool uses a Riemann Sum-inspired technique that pre-computes error ratios at 5-minute intervals, turning an expensive range query into a single fast aggregation. The result: accurate error budget calculations that scale.

faq technical prometheus performance

How do I query these generated rules in Prometheus to monitor my application?

October 14, 2025 • Cardinality Cloud • 2 min read

You’ve deployed the generated SLO rules to Prometheus - now what? The recording rules are pre-computing your SLO metrics every minute, but how do you actually check if you’re meeting your targets, monitor error budget consumption, or build dashboards? This guide shows you the essential PromQL queries to unlock the full power of your SLO monitoring, from checking current status to visualizing long-term trends.

faq prometheus promql monitoring

How do I size my Prometheus deployment?

October 12, 2025 • Cardinality Cloud • 3 min read

Your monitoring just went down because Prometheus got OOM-killed again. Or maybe you’re paying for 32GB of RAM when 8GB would suffice. Sizing Prometheus shouldn’t be guesswork - it’s actually predictable math based on three inputs: active time series, scrape interval, and retention period. Our Resource Calculator does the math for you, showing memory, CPU, and disk requirements with visual guidance on safe ranges and real-world scaling examples.

faq prometheus capacity-planning resources

What is an Error Budget?

October 10, 2025 • Cardinality Cloud • 2 min read

Engineering wants to slow down and fix stability issues. Product wants to ship faster and hit deadlines. Who’s right? Both - and neither. The real question isn’t “should we prioritize reliability or velocity?” but “how much unreliability can we tolerate while still meeting our promises?” That’s your error budget: the quantitative answer that turns endless debates into data-driven decisions. With a 99.9% SLO, you get 43.2 minutes of downtime per month to spend on innovation, experiments, or controlled risks.

faq slo error-budget