observabilitytracingPractitioner

Tempo 3.0: Unlocking Scalable Observability with TraceQL Metrics

5 min read Grafana BlogReviewed for accuracy

Practitioner — Hands-on experience recommended

Tempo 3.0 exists to tackle the challenges of scaling observability in complex distributed systems. Traditional tracing solutions often struggle with performance and cost as your infrastructure grows. By introducing a Kafka-compatible architecture, Tempo 3.0 separates read and write paths, ensuring that ingestion and querying can scale independently without impacting each other. This architectural shift not only improves performance but also reduces overhead, making it a compelling choice for organizations looking to optimize their observability stack.

The new architecture allows the distributor to first receive trace data, which is then written to a Kafka-compatible system. This acts as a durable buffer between ingestion and the rest of the system. Block-builders then consume this data to write Parquet blocks to object storage, while live-stores serve recent traces, ensuring that you have access to the latest information without delay. Additionally, the introduction of TraceQL metrics means you can query ad-hoc metrics directly from your trace data. This capability simplifies answering critical questions about performance, error rates, and service behavior across your distributed systems.

In production, you need to be aware of a few key details. The DRAIN-based span name sanitizer can help you manage span names effectively by clustering similar values and replacing sensitive data with placeholders. Make sure to enable this feature in your configuration to maintain a clean and manageable trace data set. Also, keep in mind that alerting on TraceQL metrics is still experimental, so tread carefully when implementing alerts based on these metrics. Tempo 3.0 is a significant step forward, but understanding its intricacies will be crucial for leveraging its full potential.

Key takeaways

→Leverage the Kafka-compatible architecture for scalable trace ingestion and querying.
→Utilize TraceQL metrics to query performance data directly from trace data.
→Enable the DRAIN-based span name sanitizer to manage span names effectively.
→Use the include_any policy for flexible OR-style inclusion rules in your configurations.

Why it matters

In production, Tempo 3.0's architecture allows for efficient scaling of observability solutions, significantly lowering operational costs while improving performance. This means faster troubleshooting and better insights into distributed systems.

Code examples

plaintext

{} | rate() by (resource.service.name)

plaintext

{ resource.service.name != "tempo-all" } | rate() by (resource.service.name)

YAML

1metrics_generator:
2     processor:
3       span_metrics:
4         filter_policies:
5           # Main rule: only public/server traffic.
6           - include:
7               match_type: strict
8               attributes:
9                 - key: kind
10                   value: SPAN_KI

When NOT to use this

The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.

Want the complete reference?

Read official docs

Test what you just learned

Quiz questions written from this article

Take the quiz →

DigitalOcean Serverless InferenceSponsor

OpenAI & Anthropic-compatible inference API — no GPU provisioning needed. 55+ models, pay-per-token with no minimums. VPC + zero data retention by default.

Try Serverless Inference →

Tempo 3.0: Unlocking Scalable Observability with TraceQL Metrics

Key takeaways

Why it matters

Code examples

When NOT to use this

More on this topic

Deploying Jaeger: Essential Components and Configuration for Effective Tracing

Mastering Jaeger: The Architecture Behind Effective Tracing

Mastering OTLP Exporter Configuration for Tracing