Rusting with AI: Learnings from a 100K-Line Paxos Engine

Q: What is multi Paxos and why is it considered complex to implement?

Multi Paxos is a family of consensus protocols designed to ensure that a distributed system's replicas agree on a sequence of values, even in the presence of failures. It's notoriously complex due to its multiple phases (e.g., Prepare, Accept, Commit), intricate state management across multiple nodes, leader election, and the need to handle various failure scenarios (network partitions, node crashes) while maintaining strict safety and liveness properties. Correctly implementing all these interactions and corner cases is a significant challenge.

Q: What role does Rust's safety model play in AI assisted development, especially during aggressive optimization?

Rust's compile time safety, primarily through its ownership and borrowing system, prevents common classes of bugs like null pointer dereferences, data races, and memory leaks. This is crucial during aggressive performance optimization, where developers often manipulate low level details (like memory allocations and concurrency primitives). With Rust, the compiler acts as a vigilant guardian, ensuring that even highly optimized, complex code remains memory safe and free from undefined behavior. This allows developers to confidently push the boundaries of performance without fear of introducing subtle, hard to debug memory corruption issues, which is especially valuable when AI is proposing and implementing optimizations.

In recent months, the capabilities of AI coding agents have soared, prompting many of us to explore their potential for building production-grade systems. I embarked on an ambitious project: developing a modern multi-Paxos consensus engine in Rust, aiming to push the boundaries of AI-assisted development. This initiative was born from a need to modernize Azure's decade-old Replicated State Library (RSL), a robust but aging cornerstone of many Azure services.

The original RSL, while foundational, presented several limitations for modern hardware and workloads. It lacked pipelining, causing requests to queue during votes and inflate latency. Crucially, it didn't support Non-Volatile Memory (NVM), which is now common in Azure data centers and can drastically reduce commit times. Furthermore, RSL wasn't built to leverage RDMA (Remote Direct Memory Access), a pervasive technology that could unlock significant latency and throughput gains. My goal was to address these gaps, building a functionally equivalent, yet profoundly optimized, Rust-based system.

The results were striking: approximately 130,000 lines of Rust code covering the full RSL feature set—multi-Paxos, leader election, log replication, snapshotting, and configuration changes—were written in about four weeks. Performance optimization was equally impressive, boosting throughput from 23,000 operations/sec to 300,000 operations/sec in roughly three weeks on a single laptop. Beyond these raw numbers, the project illuminated key strategies for leveraging AI effectively in complex system development.

Unprecedented Productivity with AI Agents

To achieve this pace, I utilized a suite of AI coding agents, including GitHub Copilot, Claude Code, Codex, Augment Code, Kiro, and Trae. My primary tools evolved to be Claude Code and Codex CLI, with VS Code handling diffs and minor edits. A command-line interface (CLI) workflow proved particularly effective, fostering an asynchronous coding flow that maximized my engagement. Interestingly, a psychological motivator played a role: the monthly subscription cost for advanced AI plans created a positive pressure to continuously kick off coding tasks, ensuring constant progress.

Ensuring Correctness with AI-Driven Code Contracts

The most frequent question I encountered was how AI could correctly implement something as intricate as Paxos. While comprehensive testing—over 1,300 tests ranging from unit to multi-replica integration tests with injected failures—formed the initial defense, the true breakthrough came from AI-driven code contracts.

Code contracts define preconditions, postconditions, and invariants for critical functions. During testing, these contracts are converted into runtime asserts, which can be disabled in production builds for performance. I applied this methodology at three levels:

AI-Generated Contracts: Advanced AI models like GPT-5 High proved excellent at drafting contracts. My role shifted to reviewing and refining these. For instance, the process_2a method, crucial for Paxos phase 2a message handling, included 16 distinct contracts.
Contract-Derived Test Generation: Once contracts were established, AI agents excelled at generating targeted test cases for each post-condition, automatically exploring meaningful edge cases.
Property-Based Testing (PBT) for Contracts: This was a game-changer. AI translated contracts into property-based tests, allowing the system to explore a vast, randomized input space. Any contract violation would trigger a panic, uncovering deep-seated bugs early. This approach famously caught a subtle Paxos safety violation, preventing a potential replication consistency issue before production deployment.

Lightweight Spec-Driven Development

My journey through Spec-Driven Development (SDD) evolved considerably. Initially, a rigid SDD process—progressing from requirement markdown to design markdown and then a task list—proved cumbersome, making iterative changes difficult. I transitioned to a more lightweight approach. For new features like snapshotting, I leveraged tools like spec kit's /specify command to generate a specification markdown outlining user stories and acceptance criteria.

Following this, the /clarify command became invaluable. I'd prompt the AI to self-critique and enhance the user stories and criteria, often asking it to propose additional stories. This clarification phase became my primary focus. Once satisfied, I'd enter a "plan mode," asking the AI to generate an implementation plan for a single user story—this unit of work proved ideal for current AI agent capabilities. This flexible approach allowed for easy adjustments and discoveries during the coding session.

Aggressive Performance Optimization

Performance optimization, traditionally a labor-intensive endeavor, truly shone with AI assistance. After establishing initial correctness, I dedicated three weeks purely to throughput tuning, with AI serving as an indispensable co-pilot. We iteratively boosted throughput from 23,000 to 300,000 operations/sec through a systematic loop:

Instrumentation: Asking the AI to instrument latency metrics across all code paths.
Testing & Tracing: Running performance tests and collecting trace logs.
Analysis: Leveraging AI to analyze latency breakdowns, often by generating Python scripts to calculate quantiles and pinpoint bottlenecks.
Optimization & Iteration: Requesting the AI to propose optimizations, implementing one, re-measuring performance, and repeating the cycle.

This process unveiled subtle bottlenecks, such as lock contention on asynchronous paths, redundant memory copies, and unnecessary task spawns. Rust's robust safety model was pivotal here, allowing aggressive optimizations—minimizing allocations, employing zero-copy techniques, avoiding locks, and selectively removing async overhead—with confidence, mitigating fears of memory corruption.

Wish List for AI-Assisted Coding

Reflecting on this experience, my wish list for future AI-assisted coding focuses on greater autonomy and sophistication:

End-to-End User Story Execution: While I prefer defining user stories, I envision AI taking greater ownership of the execution, handling refactoring, ensuring test coverage, and making minor fixes, only flagging genuine architectural or correctness issues for my review.
Automated Contract Workflows: The entire contract lifecycle—from generation and review to test derivation, consistency checks, and property-based testing—could be largely automated. The AI would debug failing tests and autonomously fix trivial issues, notifying me only for fundamental problems in the contract or implementation.
Autonomous Performance Optimization: I'd love to suggest high-level optimization avenues, with AI autonomously executing experiments, analyzing results, and applying sophisticated techniques to large codebases, moving beyond small, isolated code blocks.

This project underscores that AI is not just a coding assistant but a profound partner capable of driving unprecedented productivity and correctness in complex system development, particularly when paired with a robust language like Rust.

FAQ

Q: What is multi-Paxos and why is it considered complex to implement?

A: Multi-Paxos is a family of consensus protocols designed to ensure that a distributed system's replicas agree on a sequence of values, even in the presence of failures. It's notoriously complex due to its multiple phases (e.g., Prepare, Accept, Commit), intricate state management across multiple nodes, leader election, and the need to handle various failure scenarios (network partitions, node crashes) while maintaining strict safety and liveness properties. Correctly implementing all these interactions and corner cases is a significant challenge.

Q: How do AI-driven code contracts differ from traditional unit tests?

A: Traditional unit tests verify specific behaviors or outputs for given inputs. Code contracts, however, define the expected state of the system before (preconditions) and after (postconditions) a function call, and properties that must hold true throughout (invariants) its execution. While unit tests validate what a function does, contracts specify what a function must ensure. AI enhances this by generating both the contracts themselves and then deriving diverse unit and property-based tests from those contracts, providing a much deeper and more comprehensive correctness guarantee.

Q: What role does Rust's safety model play in AI-assisted development, especially during aggressive optimization?

A: Rust's compile-time safety, primarily through its ownership and borrowing system, prevents common classes of bugs like null pointer dereferences, data races, and memory leaks. This is crucial during aggressive performance optimization, where developers often manipulate low-level details (like memory allocations and concurrency primitives). With Rust, the compiler acts as a vigilant guardian, ensuring that even highly optimized, complex code remains memory-safe and free from undefined behavior. This allows developers to confidently push the boundaries of performance without fear of introducing subtle, hard-to-debug memory corruption issues, which is especially valuable when AI is proposing and implementing optimizations.