
Dictation now works
A while back, I started using Neovim. I wanted to code faster. I began learning vim motions. It took some time, but I got better. Overall, I was happy with the experience. I started with nvchad.com, and I’ve learned a lot. If you haven’t tried it, I highly recommend it. It’s easy to get started.
One day I was watching a coding video and that’s when it hit me. I’m a slow typer. My speed is about 50 to 60 words per minute (wpm). I often look at the keyboard to find keys - rookie mistake. To improve, I started training on monkeytype.com. I practiced one or two times a week. It was helpful. Now I don’t look at the keyboard much. But I still wanted to type faster. Sidenote: you should test your typing if you don’t know your wpm.
Then I read nice comments about SuperWhisper on Twitter. It’s a mac app for dictation. At first, I tried it without much hope since my previous experiences with dictation (Siri and the native Mac dictation) were horrendous, terrible, a waste of time, incredibly bad and so on. But then after a few minutes I realized. The thing works!!!
It removes all “eeehm” and “huum” from your sentences. It correctly picks up all the names and words. It punctuates correctly. It just works! Without any config required. I was amazed. You just speak, and it gets everything right. It just gets the text right. Whatever it does under the hood is not my business.
Then I ended my free trial on SuperWhisper and before buying in I decided to try alternatives. That’s when I found VoiceInk, which is basically the same but it uses only local models. I thought “hmmm if it works like my local llms do, then it might not cut it”. I downloaded a relatively small 1gb model and tried it. Worked straight away. No issues, no loss in apparent quality.
I’m amazed by VoiceInk. It took some time to learn its features. But with a bit of setup, it’s fantastic. It makes writing long texts easier. I wouldn’t say it one-shots them straight away but I find myself doing little modifications to the texts I end up using. It depends a lot on the situation. For quick chats, i just send them straight away. For emails and documentation, I end up doing minor tweaks.
VoiceInk has two modes of operation. The “plain” mode where it only runs the local model and then you get your text and the power mode, that sends your text to OpenAI or other provider with your dictated text and prompt of your choosing. Power mode feels like a big hammer. Definitely powerful, but then you will sounds like chatgpt :/
Most of the time I just use the plain mode, but power mode has been helpful in some specific situations. Just use plain mode, then you can play around with power mode. It feels cool but you won’t end up using it much IMO.
The Cringe
It still feels strange though. I feel a big weirdness when using it around other people. I try to use it when my girlfriend is not around. We just laugh when she comes in and I’m talking to my computer, but still. Haven’t been able to get past that. Guess it’s another win for remote work. Can’t even imagine a whole office with people using voice dictation.
I’ve been sharing this a lot with friends lately, so I figure why not write a blog about it. Even though I started writing it with power mode and the blog prompt I have, I ended up completely rewriting the article. I just didn’t liked the writing. It doesn’t sounds like me. It feels extraneous to me. It’s not really my voice. And why share something as “me” which is not really “me”? Who wants to read AI slop not even the author read? I still believe dictation is an awesome tool in “plain” mode, but for blogs I think it works better as a tool to write down thoughts rather than a speech-to-blog-post tool.