Rusting with AI: Learnings from a 100K-Line Paxos Engine
This article details learnings from building a 100K-line Rust multi-Paxos consensus engine with AI, modernizing Azure's RSL. It highlights massive productivity boosts, leveraging AI for code contracts to ensure correctness, adopting lightweight spec-driven development, and achieving aggressive performance optimization. Key takeaways include AI's role in complex system development and a wish list for future AI capabilities.
In recent months, the capabilities of AI coding agents have soared, prompting many of us to explore their potential for building production-grade systems. I embarked on an ambitious project: developing a modern multi-Paxos consensus engine in Rust, aiming to push the boundaries of AI-assisted development. This initiative was born from a need to modernize Azure's decade-old Replicated State Library (RSL), a robust but aging cornerstone of many Azure services.
The original RSL, while foundational, presented several limitations for modern hardware and workloads. It lacked pipelining, causing requests to queue during votes and inflate latency. Crucially, it didn't support Non-Volatile Memory (NVM), which is now common in Azure data centers and can drastically reduce commit times. Furthermore, RSL wasn't built to leverage RDMA (Remote Direct Memory Access), a pervasive technology that could unlock significant latency and throughput gains. My goal was to address these gaps, building a functionally equivalent, yet profoundly optimized, Rust-based system.
The results were striking: approximately 130,000 lines of Rust code covering the full RSL feature set—multi-Paxos, leader election, log replication, snapshotting, and configuration changes—were written in about four weeks. Performance optimization was equally impressive, boosting throughput from 23,000 operations/sec to 300,000 operations/sec in roughly three weeks on a single laptop. Beyond these raw numbers, the project illuminated key strategies for leveraging AI effectively in complex system development.
Unprecedented Productivity with AI Agents
To achieve this pace, I utilized a suite of AI coding agents, including GitHub Copilot, Claude Code, Codex, Augment Code, Kiro, and Trae. My primary tools evolved to be Claude Code and Codex CLI, with VS Code handling diffs and minor edits. A command-line interface (CLI) workflow proved particularly effective, fostering an asynchronous coding flow that maximized my engagement. Interestingly, a psychological motivator played a role: the monthly subscription cost for advanced AI plans created a positive pressure to continuously kick off coding tasks, ensuring constant progress.
Ensuring Correctness with AI-Driven Code Contracts
The most frequent question I encountered was how AI could correctly implement something as intricate as Paxos. While comprehensive testing—over 1,300 tests ranging from unit to multi-replica integration tests with injected failures—formed the initial defense, the true breakthrough came from AI-driven code contracts.
Code contracts define preconditions, postconditions, and invariants for critical functions. During testing, these contracts are converted into runtime asserts, which can be disabled in production builds for performance. I applied this methodology at three levels:
- AI-Generated Contracts: Advanced AI models like GPT-5 High proved excellent at drafting contracts. My role shifted to reviewing and refining these. For instance, the
process_2amethod, crucial for Paxos phase 2a message handling, included 16 distinct contracts. - Contract-Derived Test Generation: Once contracts were established, AI agents excelled at generating targeted test cases for each post-condition, automatically exploring meaningful edge cases.
- Property-Based Testing (PBT) for Contracts: This was a game-changer. AI translated contracts into property-based tests, allowing the system to explore a vast, randomized input space. Any contract violation would trigger a panic, uncovering deep-seated bugs early. This approach famously caught a subtle Paxos safety violation, preventing a potential replication consistency issue before production deployment.
Lightweight Spec-Driven Development
My journey through Spec-Driven Development (SDD) evolved considerably. Initially, a rigid SDD process—progressing from requirement markdown to design markdown and then a task list—proved cumbersome, making iterative changes difficult. I transitioned to a more lightweight approach. For new features like snapshotting, I leveraged tools like spec kit's /specify command to generate a specification markdown outlining user stories and acceptance criteria.
Following this, the /clarify command became invaluable. I'd prompt the AI to self-critique and enhance the user stories and criteria, often asking it to propose additional stories. This clarification phase became my primary focus. Once satisfied, I'd enter a "plan mode," asking the AI to generate an implementation plan for a single user story—this unit of work proved ideal for current AI agent capabilities. This flexible approach allowed for easy adjustments and discoveries during the coding session.
Aggressive Performance Optimization
Performance optimization, traditionally a labor-intensive endeavor, truly shone with AI assistance. After establishing initial correctness, I dedicated three weeks purely to throughput tuning, with AI serving as an indispensable co-pilot. We iteratively boosted throughput from 23,000 to 300,000 operations/sec through a systematic loop:
- Instrumentation: Asking the AI to instrument latency metrics across all code paths.
- Testing & Tracing: Running performance tests and collecting trace logs.
- Analysis: Leveraging AI to analyze latency breakdowns, often by generating Python scripts to calculate quantiles and pinpoint bottlenecks.
- Optimization & Iteration: Requesting the AI to propose optimizations, implementing one, re-measuring performance, and repeating the cycle.
This process unveiled subtle bottlenecks, such as lock contention on asynchronous paths, redundant memory copies, and unnecessary task spawns. Rust's robust safety model was pivotal here, allowing aggressive optimizations—minimizing allocations, employing zero-copy techniques, avoiding locks, and selectively removing async overhead—with confidence, mitigating fears of memory corruption.
Wish List for AI-Assisted Coding
Reflecting on this experience, my wish list for future AI-assisted coding focuses on greater autonomy and sophistication:
- End-to-End User Story Execution: While I prefer defining user stories, I envision AI taking greater ownership of the execution, handling refactoring, ensuring test coverage, and making minor fixes, only flagging genuine architectural or correctness issues for my review.
- Automated Contract Workflows: The entire contract lifecycle—from generation and review to test derivation, consistency checks, and property-based testing—could be largely automated. The AI would debug failing tests and autonomously fix trivial issues, notifying me only for fundamental problems in the contract or implementation.
- Autonomous Performance Optimization: I'd love to suggest high-level optimization avenues, with AI autonomously executing experiments, analyzing results, and applying sophisticated techniques to large codebases, moving beyond small, isolated code blocks.
This project underscores that AI is not just a coding assistant but a profound partner capable of driving unprecedented productivity and correctness in complex system development, particularly when paired with a robust language like Rust.
FAQ
Q: What is multi-Paxos and why is it considered complex to implement?
A: Multi-Paxos is a family of consensus protocols designed to ensure that a distributed system's replicas agree on a sequence of values, even in the presence of failures. It's notoriously complex due to its multiple phases (e.g., Prepare, Accept, Commit), intricate state management across multiple nodes, leader election, and the need to handle various failure scenarios (network partitions, node crashes) while maintaining strict safety and liveness properties. Correctly implementing all these interactions and corner cases is a significant challenge.
Q: How do AI-driven code contracts differ from traditional unit tests?
A: Traditional unit tests verify specific behaviors or outputs for given inputs. Code contracts, however, define the expected state of the system before (preconditions) and after (postconditions) a function call, and properties that must hold true throughout (invariants) its execution. While unit tests validate what a function does, contracts specify what a function must ensure. AI enhances this by generating both the contracts themselves and then deriving diverse unit and property-based tests from those contracts, providing a much deeper and more comprehensive correctness guarantee.
Q: What role does Rust's safety model play in AI-assisted development, especially during aggressive optimization?
A: Rust's compile-time safety, primarily through its ownership and borrowing system, prevents common classes of bugs like null pointer dereferences, data races, and memory leaks. This is crucial during aggressive performance optimization, where developers often manipulate low-level details (like memory allocations and concurrency primitives). With Rust, the compiler acts as a vigilant guardian, ensuring that even highly optimized, complex code remains memory-safe and free from undefined behavior. This allows developers to confidently push the boundaries of performance without fear of introducing subtle, hard-to-debug memory corruption issues, which is especially valuable when AI is proposing and implementing optimizations.
Related articles
PlayStation Showcase Chat Swamped by Demands for Destiny 3
PlayStation's recent State of Play showcase was largely overshadowed by an impassioned fan campaign in the Twitch chat, demanding 'Destiny 3'. Amidst reveals for new PS5 games, the chat was relentlessly spammed with #WeWantDestiny3, fueled by the unexpected sunsetting of Destiny 2 and the reported absence of a direct sequel. This digital protest reflects widespread community frustration, amplified by a popular streamer and a petition with over 330,000 signatures.
Quick Share Meets AirDrop: A Welcome Cross-Platform Step
Quick Verdict: A Much-Anticipated Bridge For years, seamless file sharing between Android and iOS devices has been a frustrating chasm, often requiring clunky workarounds or third-party apps. This month, Google is
Great Question (YC W21) Seeks Applied AI Interns: A Deep Dive
As fellow developers, we’re constantly scanning the landscape for companies pushing the boundaries, especially in the rapidly evolving AI space. Great Question, a Y Combinator W21 alumnus, has caught our eye with an
Navigating the Global AI Arena: Beyond Silicon Valley's Borders
The international AI landscape presents unique challenges and opportunities, requiring developers to think beyond traditional tech hubs. Key aspects include adapting AI models to local languages and cultures, navigating the complex global supply chain for critical hardware like semiconductors, and understanding how venture capital assesses these international ventures. Success hinges on deep local market understanding, robust technical solutions for localization, and resilience against logistical hurdles.
Engineering a Solution: Debugging Global Mosquito-Borne Diseases
As developers, we're constantly tasked with solving complex problems, whether it's optimizing a database query or architecting a distributed system. But what if the 'bug' we're trying to fix is biological, with global
Self-Host S3-Compatible Object Storage with MinIO on Staging
This guide demonstrates how to self-host an S3-compatible object store using MinIO on your staging server. By leveraging Docker Compose and Traefik for HTTPS, you can significantly reduce cloud storage costs while maintaining a production-like environment for development and testing. It covers setup, application configuration, and secure file interactions.




