Observability Part 1 - Introduction
Observability allows us to see the inner workings of our systems, no matter how complicated.
What would You say if You could solve problems inside Your system within seconds—or minutes?
Dive into this series to find out how.
Modern systems are complex. Distributed systems are even more so.
Today, we are facing challenges that traditional monitoring can no longer solve.
There are limits to how much context our minds can hold.
Now imagine a system spanning tens—or even hundreds—of services, running concurrently.
How can You trace an issue through uncorrelated logs, metrics, and traces?
Short answer: You can’t. Not without custom, time-consuming event stitching.
Observability to the rescue!
Parts in this series:
- Part 1 - Introduction (current one)
- Part 2 - OTeL Collector
What to Expect from This Series
This blog series has no fixed length. I aim to cover as much ground as needed to make observability understandable and practical.
We will explore:
- What observability truly means
- How to implement telemetry collection using the OpenTelemetry Collector and Node Exporter
- How to set up the Grafana backend: Prometheus, Loki, Tempo
- How to instrument applications—no "Hello World" nonsense
- How to build distributed, concurrent Python systems worth observing
- How to collect system metrics and ingest logs
- How to build custom telemetry processors
- How to apply zero-trust principles where possible
Each part will be hands-on, exploratory, and grounded in real examples.
Observability Basics
Observability means understanding the internal state of a system purely from its outputs.
See Definitions for clarification on terminology like "instrumentation" and "event".
Instrumentation methods:
Automatic Instrumentation
The easiest way to get started. Minimal effort, limited customization.
Currently supported languages:
- .NET
- Go
- Java
- JavaScript
- PHP
- Python
Library Instrumentation
Turns on telemetry inside popular frameworks and libraries.
More control and a few extra lines of code are needed.
Useful for third-party tools like web frameworks or databases.
Examples:
- Django
- FastAPI
- Flask
- HTTPX
- MySQL
- MongoDB
See the full list at the OpenTelemetry Python Contrib docs
Check out the full OpenTelemetry Registry.
Manual Instrumentation
This is where the magic happens.
Full customization, full control—at the cost of writing more code.
But it’s worth it.
Supported SDKs:
- C++
- .NET
- Erlang / Elixir
- Go
- Java
- JavaScript / TypeScript
- PHP
- Python
- Ruby
- Rust
- Swift
Quick Introduction to Observability
If You haven’t heard of observability—or if Your understanding is shallow—I recommend the book Observability Engineering by O’Reilly.
This is not an affiliate link. I gain nothing from it—except the joy of sharing something valuable.
When I read the early release, I knew right away: this was something special.
It took me a while to return to the topic, but once I did, I went all in—for weeks.
What I learned was transformational.
That’s what I want to share with You.
Observability is the ability to understand any internal state Your system can reach—no matter how novel or bizarre—using only the telemetry it emits.
No redeploying. No guesswork.
Observability starts with instrumenting Your application and setting up
a collector.
The collector receives telemetry - logs, metrics, traces - and forwards
it to a backend for visualization and analysis
To collect the telemetry and instrument applications, the Linux foundation has created a standardized solution named OpenTelemetry.
Open telemetry offers a neat tool called OTeL collector that runs as a docker container and receives or scrapes telemetry.
It is the tool that stitches together all the traces and logs and sends
it to the preferred backend.
What makes it so appealing is that you instrument Your code once and
then send it to whatever backend You choose, be it commercial or open
source.
Definitions
- Instrumentation – Adding code or tooling to collect telemetry from an application or system
- Event – Arbitrarily wide, structured data emitted by the system
- Metric – Numeric measurement over time (e.g., request count)
- Log – Timestamped textual or structured output
- Trace – Full journey of a request through the system
- Span – A unit of work within a trace
- Dimensionality – Breadth of data in a single event or span
- Telemetry – The collection of logs, metrics, and traces
In the next Part 2 - OTeL Collector, You will learn, how to set up OTeL collector to collect and export telemetry.