Voice AI's Second Wave Is Real
Latency, accuracy, and naturalness have all crossed thresholds that change what's buildable.
Voice as an interface has crossed three important thresholds in the last 18 months: sub-300ms round-trip latency, near-flawless ASR on accented English, and TTS that can hold a register through a multi-turn conversation without sounding robotic. These thresholds open up product categories — outbound sales calls, scheduling, intake interviews — that were not credibly automatable a model generation ago.
Most of the interesting builders in this space are small, vertically focused teams shipping into narrow domains where the existing software is brittle.