From System Freeze to 250ms
Building a Scalable Code Runner

When I set out to build Neuron, a managed code execution engine, I thought the hard part would be parsing the code. I was wrong. The hard part was concurrency.
Here is the story of how I crashed my own server, successfully engineered my way out of it, and cut execution time from 2.5 seconds to 250ms.
Phase 1: The Naive Approach
"Just spin up a Docker container!"
My initial architecture was simple (and efficient... or so I thought):
- User sends code via API.
- API spins up a brand new Docker container.
- Code runs, output is captured.
- Container is destroyed.
It worked beautifully for local testing. I felt like a genius.
The Crash 💥
Then, I decided to run a stress test. I fired up 1,000 concurrent requests.
Result: My server froze. Entirely.
The kernel panicked trying to spin up 1,000 Docker containers simultaneously. The CPU usage hit 100%, memory was swallowed whole, and requests started timing out left and right.
Lesson Learned
You cannot simply "spin up" resources on demand at scale. You need backpressure.
Why Not Just Run Synchronously?
You might ask: "Why queue at all? Why not just exec.Command() and return the result?"
Why Synchronous Execution is a Trap
- Blocked Connections: If execution takes 2 seconds, that HTTP connection is open for 2 seconds. With 1,000 users, you exhaust your file descriptors instantly.
- No Backpressure: If traffic spikes to 5x your capacity, a synchronous server crashes immediately. An async server just has a longer queue.
- Isolation: If the Runner crashes (e.g., segfault), it shouldn't take down the API server receiving new requests.
The Solution: Decouple the Receiver (API) from the Processor (Worker) using a Queue.
Phase 2: The Queue (Kafka vs. Redis)
I needed a buffer. I needed a queue.
Attempt A: Apache Kafka
My first instinct was "Enterprise Scale™", so I deployed Kafka.
- The Good: It handled the throughput easily.
- The Bad: The latency. By the time a job went from API → Kafka → Consumer, we were seeing 700ms - 1000ms of overhead just to queue the job.
For a real-time code runner, this was too slow.
Attempt B: Redis Streams
I stripped out Kafka and implemented Redis Streams.
- The Result: Queue latency dropped to ~3ms.
- Why: Redis gives us the speed of in-memory interactions.
The Lesson
Don't just jump to "Fancy Tech" because it's popular or "Enterprise". Kafka is amazing for high-throughput log aggregation, but for low-latency job queuing, it was the wrong tool.
Each technology has a specific use case. For us, simplicity won.
Phase 3: Scaling the Execution Core
Now that the queue was fast, the bottleneck moved to the worker. I tried three different strategies to handle the load.
Attempt 1: Spin Up On Demand (Unlimited)
"Just run a new container for every request."
- Logic: Simple to implement.
- Result: System Freeze.
- Why: When 1,000 requests hit at once, the server tried to boot 1,000 OS processes. The kernel panicked, swapping memory like crazy.
Attempt 2: Capped Concurrency (Fixed X Number)
"Okay, let's limit it to 50 containers at a time."
- Logic: Protect the server by making jobs wait in line.
- Result: Huge Latency Spikes.
- The Math: If execution takes 2s (startup) + 0.5s (run), each worker handles only ~0.4 jobs/sec.
- The Backlog: With 50 workers, we could only process 20 jobs/sec. A 1,000-job spike meant a 50-second wait time for the last user.
Attempt 3: The "Pre-Warmed" Pool (The Solution)
"Why are we waiting 2 seconds for startup?"
I realized we were spending 80% of our time waiting for Docker to boot, and only 20% running code.
The Fix: Treat containers like database connections.
- Warm Up: Boot 50 containers before traffic hits. Pause them.
- Execute: When a job comes, unpause/use an existing container. Startup cost: ~0ms.
- Recycle: Clean the container and put it back in the pool.
The Impact: My total execution time dropped from 2500ms to ~250ms. We achieved high throughput without melting the server.
Handling the "Dirty" Ones
Reusing containers introduces a new risk: State Pollution. We implemented a Health & Rotation Policy:
Health & Rotation Policy
- Isolation: Containers are locked down (no network, limited disk).
- Dirty Checks: If a container returns a TLE (Time Limit Exceeded) or OOM (Out of Memory), it is marked "Dirty" and destroyed.
- Continuous Health Checks: We run periodic health checks on idle containers. If a container becomes unresponsive or unhealthy, it is automatically removed and replaced.
🚀 The Result
Today, Neuron runs on a modest server but handles high concurrency with ease.
Building this taught me that "Scalability" isn't just about adding more servers. It's about Resource Management.
