Voice Summary
Record once. Search the spoken word forever.
Voice Summary turns a phone call into a usable record. The recording goes in; a structured summary, an action list, and a searchable transcript come out — attached to the right customer and project automatically.
The two-hour problem
Most transcription services choke past twenty minutes. Real sales meetings, customer calls, internal reviews — they run long. Voice Summary uses an ffmpeg-based chunker to feed the Gemini pipeline parallel slices, then re-merges them into a coherent narrative.
The shortest path from a meeting to a follow-up email is the one without re-listening.
What you get back
- A clean, formatted transcript.
- A summary keyed to the customer and project.
- An action list with named owners.
- An ad-hoc query box: ask the recording anything, get an answer in seconds.
Long-form transcription
Internal chunker + Gemini pipeline handles 2-hour recordings without truncation. ffmpeg pre-processing strips silence and chunks parallel batches.
Structured business context
Extracted fields render as readable rows — customer name, project, action items, blockers — not raw JSON.
Ask anything, after the fact
The right-side AI panel takes ad-hoc questions about the recording: 'What did we promise on delivery?' answered in seconds.
Resilient to ops failures
Stuck-row sweep + manual retry button. A killed worker no longer leaves a recording spinning forever.
CRM
Customers as a first-class object, not a row in a spreadsheet.
See the moduleAI Chat Summary
LINE conversations become a clean ledger of what was actually agreed.
See the moduleProjects
The work, the people, and the conversations — on one timeline.
See the moduleHow long can a single recording be?
Tested on 2-hour internal meetings. Audio is split into parallel chunks and re-merged transparently.
What languages?
Thai and English are the production targets. Other languages work but are not officially supported yet.
Where is the audio stored?
Inside your tenant's Drive (Google Workspace). The platform never holds raw audio outside the tenant boundary.