Kubernetes Operators: Safety First Through Model Checkers

Today’s Kubernetes Operators aren’t just a fancy toy, but utilities managing critical infrastructure. Many best practices are already applied, increasing their safety: unit/e2e testing, code reviews and post mortem analysis. This talk introduces some more recent tooling for working developers toolbox: model checkers. The likes of TLA+ and alloy have already been used for helping design many real-world systems, from S3 all the way to RTOS (real-time operating system) with massive success. They allow us to design and model our systems in the abstract, state the system facts, assumptions and expected rules to hold, and finally, they analyze our model for inconsistencies or scenarios we haven’t thought of - like code review for system design on steroids. This talk introduces model checkers, covers the motivation behind them, and finishes with a short example. [Slides, Video, etc.]

What does the kubelet say?

Etcd says store, kube-proxy says route, API server says 418, what does the kubelet say?

Kubelet is one of the central components in the kubernetes cluster. Most of us are taking it for granted that is would just work and start our containers. CNI handles the networking part, kube-proxy the service part, but kubelet does more than just starting containers. In this talk, I cover kubelet on a high level before deep diving in the belly of the beast and its interfacing with CNI, container runtime and ultimately Linux kernel.

[Slides, Video, etc.]

Golang race detection

Data races are nasty kinds of bugs; rare, hard to replicate and tend to occur at the worst possible moment. Their effect is undefined, detection hard, almost impossible without expensive formal verification and static analysis tools…Or is it?

This talk focuses on ThreadSanitizer, a library for detecting race conditions at run-time. It originated in clang & C++ community and its use spread to go (-race), rust, java, and some other languages.

It covers how it works conceptually, and necessary background for its understanding.

[Slides, Video, etc.]

Pragmatic execution tracing

This talk covers contemporary execution tracing technologies; from gathering execution traces, storage, and analysis.

Optimizing application latency and speed is a difficult challenge since many factors could lead to slowdown –> network IO, CPU scheduler, waiting on a mutex, database, other services… Many of those non-CPU intensive activities don’t show on the traditional profilers (pprof) and aren’t visible on flame graphs. Tracing method execution calls in time, from the moment it starts, until it ends with adding metadata enables us a deeper insight into our program, hidden otherwise. Additionally distributed tracing tracks request over multiple services, complementing logging for the microservice architecture.

This talk covers the most simple Chrome Trace event format, and briefly present two frameworks for distributed tracing opentracing.io and OpenCensus.

[Slides, Video, etc.]