Back to work
Next.js · MediaPipe · Gemini 2.5 Flash · WASM

IRIS

From “find my keys” to fingers on the object in ~30 seconds. No app install.

~30s
Voice to contact
~30fps
Hand tracking
0
App installs

A browser-based guidance system that takes a visually impaired user from a spoken request to physically touching the object — no app, no wearables. Voice in, haptic search, stereo-audio hand guidance, confirmed contact.

The problem

Object-finding tools for blind users usually need an app install, a fixed set of object classes, or extra wearables. Every one of those raises the bar to actually using the thing in a real moment.

Approach

A zero-shot pipeline: Gemini 2.5 Flash acts as an open-vocabulary detector that finds anything described in natural language — no fixed classes — while MediaPipe hand tracking runs client-side at ~30fps in WASM. Only one frame per second hits the server, so it stays responsive and cheap.

The guidance loop

Voice input names the target. The user sweeps the phone while haptic feedback intensifies as the camera nears the object. Stereo audio then guides the hand toward it, and a final cue confirms contact — the whole loop runs in the browser.

Results

Voice command to fingers on the object in roughly 30 seconds, entirely in a web browser — no install, no wearables, no fixed vocabulary.