IRIS
From “find my keys” to fingers on the object in ~30 seconds. No app install.
A browser-based guidance system that takes a visually impaired user from a spoken request to physically touching the object — no app, no wearables. Voice in, haptic search, stereo-audio hand guidance, confirmed contact.
The problem
Object-finding tools for blind users usually need an app install, a fixed set of object classes, or extra wearables. Every one of those raises the bar to actually using the thing in a real moment.
Approach
A zero-shot pipeline: Gemini 2.5 Flash acts as an open-vocabulary detector that finds anything described in natural language — no fixed classes — while MediaPipe hand tracking runs client-side at ~30fps in WASM. Only one frame per second hits the server, so it stays responsive and cheap.
The guidance loop
Voice input names the target. The user sweeps the phone while haptic feedback intensifies as the camera nears the object. Stereo audio then guides the hand toward it, and a final cue confirms contact — the whole loop runs in the browser.
Results
Voice command to fingers on the object in roughly 30 seconds, entirely in a web browser — no install, no wearables, no fixed vocabulary.