Observability Part 1 - Introduction

Created: 2025-04-15

Tags: observability

Observability allows us to see the inner workings of our systems, no matter how complicated.
What would You say if You could solve problems inside Your system within seconds—or minutes?
Dive into this series to find out how.

Modern systems are complex. Distributed systems are even more so.
Today, we are facing challenges that traditional monitoring can no longer solve.
There are limits to how much context our minds can hold.
Now imagine a system spanning tens—or even hundreds—of services, running concurrently.
How can You trace an issue through uncorrelated logs, metrics, and traces?
Short answer: You can’t. Not without custom, time-consuming event stitching.
Observability to the rescue!

Parts in this series:

Part 1 - Introduction (current one)
Part 2 - OTeL Collector

What to Expect from This Series

This blog series has no fixed length. I aim to cover as much ground as needed to make observability understandable and practical.
We will explore:

What observability truly means
How to implement telemetry collection using the OpenTelemetry Collector and Node Exporter
How to set up the Grafana backend: Prometheus, Loki, Tempo
How to instrument applications—no "Hello World" nonsense
How to build distributed, concurrent Python systems worth observing
How to collect system metrics and ingest logs
How to build custom telemetry processors
How to apply zero-trust principles where possible

Each part will be hands-on, exploratory, and grounded in real examples.

Observability Basics

Observability means understanding the internal state of a system purely from its outputs.
See Definitions for clarification on terminology like "instrumentation" and "event".

Instrumentation methods:

Automatic Instrumentation

The easiest way to get started. Minimal effort, limited customization.
Currently supported languages:

.NET
Go
Java
JavaScript
PHP
Python

Library Instrumentation

Turns on telemetry inside popular frameworks and libraries.
More control and a few extra lines of code are needed.
Useful for third-party tools like web frameworks or databases.

Examples:

Django
FastAPI
Flask
HTTPX
MySQL
MongoDB

See the full list at the OpenTelemetry Python Contrib docs
Check out the full OpenTelemetry Registry.

Manual Instrumentation

This is where the magic happens.
Full customization, full control—at the cost of writing more code.
But it’s worth it.

Supported SDKs:

C++
.NET
Erlang / Elixir
Go
Java
JavaScript / TypeScript
PHP
Python
Ruby
Rust
Swift

Quick Introduction to Observability

If You haven’t heard of observability—or if Your understanding is shallow—I recommend the book Observability Engineering by O’Reilly.
This is not an affiliate link. I gain nothing from it—except the joy of sharing something valuable.

When I read the early release, I knew right away: this was something special.
It took me a while to return to the topic, but once I did, I went all in—for weeks.
What I learned was transformational.
That’s what I want to share with You.

Observability is the ability to understand any internal state Your system can reach—no matter how novel or bizarre—using only the telemetry it emits.
No redeploying. No guesswork.

Observability starts with instrumenting Your application and setting up a collector.
The collector receives telemetry - logs, metrics, traces - and forwards it to a backend for visualization and analysis

To collect the telemetry and instrument applications, the Linux foundation has created a standardized solution named OpenTelemetry.

Open telemetry offers a neat tool called OTeL collector that runs as a docker container and receives or scrapes telemetry.

It is the tool that stitches together all the traces and logs and sends it to the preferred backend.
What makes it so appealing is that you instrument Your code once and then send it to whatever backend You choose, be it commercial or open source.

Definitions

Instrumentation – Adding code or tooling to collect telemetry from an application or system
Event – Arbitrarily wide, structured data emitted by the system
Metric – Numeric measurement over time (e.g., request count)
Log – Timestamped textual or structured output
Trace – Full journey of a request through the system
Span – A unit of work within a trace
Dimensionality – Breadth of data in a single event or span
Telemetry – The collection of logs, metrics, and traces

In the next Part 2 - OTeL Collector, You will learn, how to set up OTeL collector to collect and export telemetry.