Free Prometheus Alert Rule and SLO Generator

Tools for Prometheus monitoring: SLO-based PromQL generator, error budget calculator, and scaling to avoid OOMs.

Brought to you by Cardinality Cloud, LLC.

How do I size my Prometheus deployment?

Your monitoring just went down because Prometheus got OOM-killed again. Or maybe you’re paying for 32GB of RAM when 8GB would suffice. Sizing Prometheus shouldn’t be guesswork - it’s actually predictable math based on three inputs: active time series, scrape interval, and retention period. Our Resource Calculator does the math for you, showing memory, CPU, and disk requirements with visual guidance on safe ranges and real-world scaling examples.

The Three Key Inputs

  1. Active Time Series: The number of unique time series your Prometheus instance tracks
  2. Scrape Interval: How often Prometheus collects metrics (default: 60 seconds)
  3. Retention Period: How long to store historical data (default: 30 days)

Finding Your Active Time Series Count

If you already have Prometheus running, query:

1prometheus_tsdb_head_series

This metric shows the current number of time series in the Prometheus TSDB (Time Series Database) head block. If you’re planning a new deployment, estimate based on:

  • Number of targets (servers, containers, etc.)
  • Metrics per target (typically 500-2000 per host, 50-200 per container)
  • Expected growth over the retention period

Resource Calculations

Memory Requirements

Prometheus memory usage scales linearly with active time series:

$$\text{Memory (GB)} = \frac{\text{time\_series} \times 7.5 \text{ KiB}}{1024^2}$$
  • Recommended: 7.5 KiB per time series
  • Safe Range: 7-9 KiB per time series

The actual memory usage depends on:

  • Series churn rate: How frequently time series appear and disappear
  • Label cardinality: Number of unique label combinations
  • Sample rate: Higher scrape frequencies increase memory pressure

CPU Requirements

CPU usage is harder to predict as it depends on query complexity, but a general rule:

$$\text{CPU Cores} = \max\left(2, \left\lfloor\frac{\text{Memory (GB)}}{4}\right\rfloor\right)$$

Allocate 1 core per 4GB of memory, with a minimum of 2 cores. CPU load increases with:

  • Recording rules: Pre-computing aggregations
  • Alert rule evaluations: Complex PromQL queries
  • Query load: Dashboard refreshes, API queries, ad-hoc exploration

Disk Space Requirements

Disk usage depends on retention period and sample density:

$$\text{Disk (GB)} = \frac{\text{time\_series} \times \text{samples\_per\_series} \times 1.5 \text{ bytes}}{1024^3} \times 1.2$$

Where:

$$\text{samples\_per\_series} = \frac{\text{retention\_days} \times 86400}{\text{scrape\_interval}}$$
  • 1.5 bytes per sample: Prometheus’s efficient compression based on the Gorilla compression algorithm from Facebook
  • 1.2x multiplier: 20% overhead for WAL (Write-Ahead Log) and temporary data

Interpreting the Chart

The Resource Calculator shows:

  • Green shaded area: Safe memory range (7-9 KiB per series)
  • Solid green line: Recommended allocation (7.5 KiB)
  • Blue dots: Example configurations at common scales
  • Red dot: Your specific configuration
  • Logarithmic scale: Better visualization across orders of magnitude (1K to 10M+ series)

Important Caveats

These estimates are starting points, not guarantees. Actual resource usage varies based on:

  • Recording rules: Each recording rule creates new time series, increasing memory
  • Alert rules: Complex alert evaluations increase CPU usage
  • Query patterns: Heavy dashboard loads or complex queries require more CPU
  • Remote write: Sending data to remote storage adds CPU and network overhead
  • Cardinality explosions: Poorly designed metrics can create millions of series unexpectedly

Monitoring Your Actual Usage

After deployment, monitor these Prometheus metrics:

 1# Memory usage
 2process_resident_memory_bytes
 3
 4# Active time series
 5prometheus_tsdb_head_series
 6
 7# Disk usage
 8prometheus_tsdb_storage_blocks_bytes
 9
10# Ingestion rate (samples per second)
11rate(prometheus_tsdb_head_samples_appended_total[5m])

Learn More