Our AI Development Workflow: Two Models, Zero Compromise

AI-assisted coding is useful, but it has a blind spot: echo chambers.

If you use a single model (like Claude 3.5 Sonnet) to write your code, and then ask that same model to review its own work, it will almost always tell you it did a great job. It suffers from confirmation bias. It inherently "agrees" with its own architectural decisions and logic patterns.

After shipping multiple products at Uptrail, I developed a strict "Two-Pass" methodology that eliminates this problem.

The Core Concept

The methodology is simple: Different model families for build and review to maximise coverage.

The Builder (Gemini 1.5 Pro): Fast, handles massive codebases without dropping context, and is excellent at scaffolding and churning out feature implementations.
The Reviewer (Claude 3 Opus): Slower, more expensive, but is better at catching subtle issues. It acts as the lead architect and security auditor.

Phase 1: The Build (Gemini)

I start every sprint with an Antigravity workflow via a slash command (e.g., /sprint-04-build). This command points Gemini to the architecture documents, design system tokens, and the feature specifications.

Gemini's job is pure execution.

bash

// Example of the turbo workflow pattern
Task: Build the project detail page template.
Constraint: Read DESIGN-GUIDE.md for typography.
// turbo
pnpm lint
// turbo
pnpm build

Gemini writes the components, wires up the data layer, and ensures the build passes. But I do not merge this code yet.

Phase 2: The Review (Opus)

Once the build is "complete," I switch tools or instances and bring in Claude 3 Opus. I run a /sprint-review command.

Opus does not write feature code. Its entire prompt is configured to be hostile, pedantic, and obsessed with edge cases.

I specifically instruct Opus: "Assume the previous model made architectural mistakes. Look for race conditions, accessibility failures, and deviations from PATTERNS.md."

Real Examples of the Two-Pass System in Action

Here's what Opus routinely catches that Gemini (and often I) miss:

The Accessibility Trap: Gemini often builds beautiful interactive components (like the 3D tilt cards on this site) but forgets prefers-reduced-motion. Opus flags this immediately.
The State Mutation Bug: In Sprint 03, Gemini wrote a perfectly functional UI that mutated a React state object directly before setting it. It worked locally. Opus caught the anti-pattern and enforced immutability.
The Race Condition: When building the ModelMesh demo API, Gemini used a simple in-memory queue. Opus flagged that this would fail in a Vercel serverless environment and rewrote it to use Upstash Redis.

The Flywheel: PATTERNS.md

Every time Opus flags a mistake, my retro workflow automatically logs it in a file called PATTERNS.md.

If a mistake happens three times, it gets promoted to critical status and injected into the root .agents/rules directory.

This means that Gemini actually learns from Opus's reviews over time. The builder gets smarter, the reviews get faster, and the codebase remains pristine.

Using two different models forces a "clash of perspectives." What one model considers standard practice, the other questions. It is the closest thing I have found to true pair-programming with AI.

If you are just using one model, you aren't pair programming. You're just talking to a very fast typist who agrees with everything you say.