Voice dictation will become huge over the next 2-3 years. Advances in AI have made it possible for us to ramble over multiple minutes, and get a polished textual output of what we meant. No umms and ahhs.
Still, it feels a bit weird to talk to your computer, especially in a public setting. Another place where dictation feels wrong is when formulating thoughts. I’m typing this out, for example, because I haven’t formed a coherent opinion yet. Writing is thinking, the kind of writing done using the keyboard or a pen, for me at least. Though this may change in the future and we may get used to dictation the same way we got used to keyboards.
I find dictation really useful for chatting with LLMs and doing stream of consciousness journaling. I’m sure others can get more use out of this, and even write books.
There are two ways transcription can be achieved. Running a model locally or on the cloud. I don’t have a preference for either as long as it works. Although it would be nice to use a provider that respects the user’s privacy if using dictation for journaling. Cloud providers have the benefit of providing transcription services on low-cost hardware, though they do come with a subscription cost that may not be affordable by the people who need it the most.
There are a plethora of dictation options on Mac: MacWhisper, superwhisper, VoiceInk, Aqua Voice, Wispr Flow. I’m sure there are more. Some of these even support Windows.
Dictation on Linux, however, is in a sad state. I was speaking to a friend as to why this is, and he said it’s because Linux users tend to build things themselves. And it’s true. There are hundreds of repositories on GitHub where Linux users have uploaded their code, usually a CLI, that runs a model locally and transcribes text. I was able to hack together something myself. It runs the parakeet-v3 model locally, and uses dotool to mimic paste:
What I would really like is one application that works across different Linux setups. Just to feel normal. I don't want to check what display server protocol I'm using before installing a desktop app. Handy is the only option I have found that even considers supporting different Linux variations. It includes the parakeet-v3 model that I used in my application and looks great aesthetically.