A Redis Job Queue Was Enough
We almost reached for Kafka. Then we counted our jobs per second and put BullMQ on Redis instead. Two years later, here is why we have not regretted it.
Engineering Journal
Tradeoffs, failures, and design decisions from building an AI-infrastructure learning platform. Postgres, Redis, real-time systems, vector databases, Kubernetes, AIOps — and what actually breaks in production.
Browse by topic
We almost reached for Kafka. Then we counted our jobs per second and put BullMQ on Redis instead. Two years later, here is why we have not regretted it.
The order in which you commit a row and publish a real-time event determines whether your UI and your database can ever disagree. Always write first.
Postgres tops out at a few hundred concurrent connections. Workers and serverless handlers want thousands. Transaction-mode pooling is the cheap fix — and a small set of constraints you have to live with.
Why most AIOps tools stop at alerting, and what it takes to build systems that actually fix problems.
Explore different anomaly detection techniques and how to choose the right approach for your infrastructure monitoring needs.
Understanding semantic network failures like DNS issues, certificate expiry, and configuration errors that cause real outages.