The JK Jung Times
TECHNOLOGY

From Camera to Flashcards: Building AI Smart Scan

How I turned a phone camera into a vocabulary extraction engine using Gemini's vision API and Cloud Functions.

By JK Jung, Staff Developer | Los Angeles Bureau | Thursday, April 16, 2026

From Camera to Flashcards: Building AI Smart Scan

The idea was simple: point your phone camera at a textbook page, and the app automatically extracts vocabulary words and creates flashcards. The implementation was anything but simple — it involved camera capture, image preprocessing, Gemini's vision API, structured output parsing, and graceful error handling for blurry photos.

Multi-language support added significant complexity to the extraction pipeline. The system needed to handle English, Korean, Japanese, and Chinese text — often mixed on the same page. Language detection runs as a preprocessing step, analyzing character distributions to determine the primary language and any secondary languages present. Gemini's multilingual capabilities handled the actual extraction, but prompt templates had to be language-specific to produce natural-sounding definitions and properly handle linguistic nuances like Korean honorifics or Japanese kanji readings.

The camera integration in Flutter uses platform channels to access native camera APIs. We needed high-resolution captures with auto-focus confirmation — a blurry image produces garbage vocabulary. The native layer

...

Tags: AI, Gemini, Computer Vision, Flutter