Production Services Series: Building Transcripts of your Work Using Descript

Published October 8, 2020

We cover gear a lot on the Lensrentals blog. Which lenses are sharpest? Which camera has the most features? What does the inside of a GFX100 look like? What we rarely address, especially on the video side, is the equally-important field of post-production. It won’t matter what codec you shot or which lenses you used if you can’t turn your footage into a usable end product, and we’ve got just as much experience shaping footage in post as we do shooting in the field. So we thought we’d run a short series of articles in which we highlight some of our favorite post-production services. To be clear, none of these articles are paid promotions or partnerships. In cases where a service costs money, either Lensrentals or the author has signed up and paid exactly how any other consumer would.

I’ve written before about how important transcription is to my documentary workflow. In truth, I can’t imagine going into the editing process on any interview-heavy project without first having complete transcripts of every interview, and I’m guessing that’s a nearly universal opinion among documentary editors. My personal workflow, in fact, often begins purely on paper, or at least in a Google Doc. Rather than scrubbing through hours of talking heads in editing software, it’s both easier and more intuitive to read through the transcripts, copy/pasting the framework into a sort of script before you start actually cutting.

As integral as this process is, though, I’ve never really found a service I’d consider perfect. As I mentioned in the blog article linked to above, I used to use Adobe Story because it integrated well with Premiere Pro. That was four years ago, though, before Adobe stopped supporting Story and I grew to prefer different software (more on that later in this series). For a time after that I used paid manual transcription services, but they were too expensive, often not accurate enough, and didn’t integrate as directly with my footage as Adobe Story did. I then drifted aimlessly from service to service before I finally tried Descript.

Descript is AI-based transcription software, meaning that the work of actually recognizing different speakers and translating their speech to text is done by software, not by a human being. There are pros and cons to this approach, but, overall, I prefer it by far over manual transcription. First of all, it’s nearly instant. In my experience, it took Descript 3-5 minutes to transcribe an hour-long interview, something that it would take even a super-capable transcriptionist at least a few hours to finish. As a result of that lack of human involvement, it’s also far cheaper. I’ll get to specific pricing later, but software-based transcription in general costs a small fraction of what manual transcription can cost. Admittedly, software transcription can be less accurate than human transcription, but I’ve found the difference to be relatively negligible as long as the source audio is good. Where you’ll often see the most mistakes is in punctuation and spelling of proper nouns like business names. I always listen through the interview a second time and make corrections to the transcript, though, with both human and software transcription, so, to me, the difference in accuracy (around 95% vs. around 99%) is well worth the difference in price. If that difference is a deal-breaker for you, Descript also provides manual transcription for $2 per minute.

Overall, the process is pretty simple. Descript has a dedicated app that runs on either Mac or Windows computers. You just move your audio or video file into the app, then Descript generates a transcript. All your media and generated transcripts are backed up to cloud storage automatically, and project organization is far better than in any other similar service I’ve seen. A crucial feature of Descript and other software-based transcription services is that the resulting transcript is more directly connected to the file than it would be if the file were transcribed manually, reducing the need to constantly check timecode stamps. You just pick a point in the script and hit play, then Descript plays back your file directly from that point. There’s even a timeline that shows each word in the script directly above its corresponding waveform in the file. This direct connection between script and file is at the core of many of Descript’s most useful features. See, while I personally only use Descript to generate transcripts, it seems designed equally, if not primarily, as an editing tool.

Let’s say you’re an editing novice who wants to publish a podcast. While Pro Tools might be too intimidating, just about anyone can wrap their minds around text-based editing. In this case, you’d just import your tracks into Descript composition, then edit the resulting transcript exactly how you would edit a word processing document. Remove a sentence from the script and the sentence is removed from the audio file. While this approach might not offer all the flexibility and fine-tuning of a more traditional DAW, it’s simple enough that just about anyone who’s written a document on a computer can understand it. Basically, my dad could edit a podcast this way, and it’s impressive that a piece of software can simplify such a complicated process so effectively.

I will say, though, that Descript’s focus on editing is the source of most of my minor gripes with the software. There are two editing modes within Descript, “Edit Media” and “Edit Text.” In the “Edit Media” mode, changes to the text are reflected in the playback. If you delete a sentence from the transcript, you won’t hear that sentence in the source file. While that’s exactly how my dad would want to edit a podcast, it’s not how I want to adjust a transcript. That’s where “Edit Text” mode comes in. That mode will allow you to make changes to your transcript without affecting playback of the associated media. Descript defaults to “Edit Media” mode, so I have to switch to “Edit Text” every time I work on a transcript.


This isn’t a big deal except as an indication that this software isn’t yet totally designed around my use case. That’s understandable because I’m obviously not the only person using this software. The target market seems to be podcast producers rather than low-budget documentary editors. Descript is a relatively new app, though, and the developers seem very open to feature additions, so I’m hoping that, as it grows, I’ll see changes that make my workflow even easier. For instance, given how often I copy/paste interview dialogue into a new editing script, I’d love to see a new composition mode that would allow me to select portions of interview dialogue from multiple transcripts and more easily collect and arrange them in a new document. I’d also love to see integrations (beyond XML export) with third-party software and services like Premiere Pro, Davinci Resolve, and Frame.io.

Overall, though, Descript is as close as I’ve found to a perfect transcription tool, and it only seems to be getting better. The feature list is expanding all the time, including through collaboration features that I found extremely helpful but didn’t have room to cover here. It also looks great, and that goes a long way. Their basic paid plan is $12 per month, which includes all the features I listed here and 10 hours of software transcription per month. There’s also a free trial of 3 hours of transcription so you can get a feel for the software before you make a decision. If you do try it out, let us know what you think. And if you have any other transcription services you prefer, feel free to shout them out in the comments.

Author: Ryan Hill

My name is Ryan and I am a video tech here at Lensrentals.com. In my free time, I mostly shoot documentary stuff, about food a lot of the time, as an excuse to go eat free food. If you need my qualifications, I have a B.A. in Cinema and Photography from Southern Illinois University in beautiful downtown Carbondale, Illinois.

Posted in Equipment
Follow on Feedly