When Vibe Coding Breaks Down, Add Another Vibe
I’ve been using Claude Code for a personal project lately. The workflow is exactly what you’d expect from “vibe coding” - describe what I want, watch the code appear, iterate. It works remarkably well, until it doesn’t.
Every now and then, Claude would hit a wall. Context would get cleaned, the conversation would get long, and suddenly, changes started making things worse. This happened especially when writing test cases - the kind of work that requires holding a lot of context about what the code should do versus what it actually does.
Out of frustration, I started throwing the problematic code at Codex for a second opinion. “Why isn’t this working?” I’d ask. Codex would analyse the code, spot issues Claude had missed, and offer a different perspective.
What surprised me was how often this worked. Sometimes Codex’s insight solved the problem directly. Other times, I’d feed Codex’s feedback back to Claude, they’d go back and forth (with me as the messenger), and the solution that emerged was often more elegant than either would have produced alone. I recently added Gemini-3 into the mix.
Then, as happens with side projects, the actual project became secondary. The interesting question became: what if I could automate this? What if two models could debate their way to a consensus on a bug fix or feature implementation?
I’m still working through this idea. But I recently discovered Karpathy’s llm-council project, which explores similar territory - using multiple LLMs to deliberate and reach collective judgments.
This has also led me to another question: how do you actually manage multiple AI models in practice? Routing, orchestration and observability across different providers - suddenly AI gateways start looking very relevant. But that’s a whole other discussion.
There’s something here. The failure mode of a single model in a long context isn’t “wrong answers” - it’s confident drift. A second model doesn’t share that drift. It has fresh eyes.
Maybe the future of AI-assisted coding isn’t one really smart model. Maybe it’s several models keeping each other honest.

