I Shipped an Entire App Using Only AI Agents
Flashcards Alarm was built 100% by Claude and Gemini — from architecture to App Store submission. Here's exactly how that worked.

Let me be upfront: every line of code in Flashcards Alarm was generated by AI. Not 'AI-assisted' — fully AI-generated. Claude Opus 4.6 handled architecture decisions and complex logic. Gemini handled content generation pipelines. I directed the process, reviewed the output, and made product decisions. The machines wrote the code.
The tooling setup was the foundation of the entire process. I configured Cursor IDE with Claude as the primary coding agent, connected to MCP servers for file system access, terminal operations, and browser automation. The key insight was establishing a project structure and coding standards upfront — a detailed rules file that defined naming conventions, architectural patterns, and testing requirements. This gave Claude consistent guardrails to work within, dramatically reducing the amount of back-and-forth needed for each feature. Without clear constraints, AI agents produce inconsistent code that becomes harder to maintain with each addition.
“Flashcards Alarm was built 100% by Claude and Gemini — from architecture to App Store submission. Here's exactly how that worked.”
The workflow started with architecture. I described the product to Claude in natural language: an alarm app that forces you to study flashcards before you can dismiss it. Claude proposed the tech stack (Flutter, Firebase, Cloud Functions), designed the data models, and scaffolded the entire project structure.
State management was where Claude's architectural instincts really shone. I described the data flow requirements — user authentication state, flashcard deck management, alarm scheduling, and subscription status — and Claude proposed a clean provider-based architecture with clear separation between UI state, business logic, and data persistence. The resulting state management layer used Riverpod for dependency injection, with repository patterns that abstracted Firestore operations behind clean interfaces. When I later needed to add offline caching, the abstraction layer meant changes were isolated to the repository implementations without touching any UI code.
For each feature, I'd describe the desired behavior, edge cases, and constraints. Claude would generate the implementation, I'd review it, request changes where needed, and iterate. The feedback loop was surprisingly similar to working with a human engineer — except Claude never gets tired and responds in seconds.
Testing with AI agents followed a counterintuitive workflow. Instead of writing tests after implementation, I described the expected behavior first, had Claude generate comprehensive tests, then had Claude implement the feature to pass those tests. This test-first approach, traditionally difficult to maintain because of the discipline it requires, became natural when the AI was doing both the test writing and the implementation. The resulting test suite covers 87% of the codebase, with particularly thorough coverage of edge cases in the spaced repetition algorithm and alarm scheduling logic.
Gemini's role was the AI content generation pipeline. When users want to auto-generate flashcards, the request goes to Cloud Functions, which calls Gemini to analyze the topic and produce question-answer pairs. The prompt engineering here was critical — poorly prompted models generate vague, unhelpful cards.
Debugging and error resolution revealed an interesting dynamic in AI-assisted development. When a bug surfaced — and they did surface regularly — I'd describe the symptom to Claude, who would then systematically read relevant files, hypothesize root causes, and propose fixes. The cycle time from bug report to fix was remarkably fast, typically under five minutes for logic errors. However, platform-specific bugs were harder: Claude could reason about Android's AlarmManager behavior from documentation, but couldn't reproduce device-specific quirks from Samsung or Xiaomi. These platform edge cases still required my human judgment and device-in-hand testing to resolve.
The hardest parts weren't coding — they were the same hard parts as any software project. Platform-specific alarm behavior. App Store review guidelines. Edge cases in subscription management. AI doesn't eliminate complexity; it changes who writes the code that handles it.
The App Store submission process was the one phase where AI couldn't fully substitute for human experience. Apple's review guidelines are extensive and sometimes subjectively enforced. Claude helped generate the app description, privacy policy, and screenshot metadata, but I made the final decisions on App Store Optimization keywords, pricing strategy, and which features to highlight. The first submission was rejected for a missing privacy disclosure about Gemini API data usage — a nuance that required understanding Apple's evolving stance on AI-powered features. After adding the disclosure and resubmitting, the app was approved within 48 hours.
Total development time: 6 weeks from concept to App Store submission. For context, a similar app built traditionally would take a solo developer 3-4 months. The 60% time reduction wasn't from faster typing — it was from eliminating the gap between thinking about code and having code.
Six months after launch, the AI-generated codebase has proven surprisingly maintainable. I've shipped 12 updates, each developed using the same AI-driven workflow. The code follows consistent patterns because Claude doesn't have tired Friday afternoon code versus fresh Monday morning code — every function gets the same level of attention. The areas that required the most human intervention were cross-cutting concerns that span multiple systems: subscription state affecting alarm behavior affecting flashcard selection affecting analytics events. AI agents excel at implementing features in isolation but still need human guidance to ensure these interconnected systems behave correctly as a unified product.
Let me be upfront: every line of code in Flashcards Alarm was generated by AI. Not 'AI-assisted' — fully AI-generated. Claude Opus 4.6 handled architecture decisions and complex logic. Gemini handled content generation pipelines. I directed the process, reviewed the output, and made product decisions. The machines wrote the code.
The tooling setup was the foundation of the entire process. I configured Cursor IDE with Claude as the primary coding agent, connected to MCP servers for file system access, terminal operations, and browser automation. The key insight was establishing a project structure and coding standards upfront — a detailed rules file that defined naming conventions, architectural patterns, and testing requirements. This gave Claude consistent guardrails to work within, dramatically reducing the amount of back-and-forth needed for each feature. Without clear constraints, AI agents produce inconsistent code that becomes harder to maintain with each addition.
The workflow started with architecture. I described the product to Claude in natural language: an alarm app that forces you to study flashcards before
...
Tags: AI, Claude, Gemini, MCP, Automation
See Also:
→ The Five-Word Quiz That Fills an Empty Deck on Day One→ AI Agents Are Replacing the Traditional Software Development Lifecycle→ Building a Multi-Tenant Marketplace from Scratch→ PostgreSQL vs Firestore: A Practical Decision Framework→ How GenAI Reduced Our Operational Overhead by 90%Browse all articles →Key Facts
- • Category: Dev
- • Reading time: 20 min read
- • Technology: AI
- • Technology: Claude
- • Technology: Gemini