Last spring I was teaching Introduction to Computer Vision at Drexel when COVID-19 hit and, well, everyone had to adjust quickly. While my course always had an online section, I decided to make all of my lectures asynchronous in to provide maximum flexibility to students (and myself).
After some prep, I booted up quicktime on my mac and screen-recorded my first lecture at 1080p. Even though quicktime encoded using H.264, the file size was massive: nearly 5 GB for only an hour lecture.
No problem, I figured, probably just a high bit rate and lack of b-frames. I re-encoded it using two-pass x264 but found the result was still hundreds of megabytes.
You might be wondering why this is a problem. Most videos are hundreds of megabytes, right? Well, I had two issues here:
- I needed to ensure all of my students could watch the videos, and knew many of them may be in locations with poor internet access.
- A lecture was only my slides. I average about 1 per minute, so surely I only needed 60 frames of video which should be very, very small.
After some googling around, I found x264 had exactly a feature for such videos: Constant Rate Factor (CRF) encoding. With CRF you instruct the encoder to target a provided quality level throughout the encoding. This is great for low-motion videos like lecture slides. Since there is very little movement the CRF encoder can keep a very low bitrate and result in an extremely small file size. I have the occasional video or animation in my slides and CRF simply increases the bitrates at those times.
On top of that, I implemented a few more tricks such as:
- Telling x264 to optimize still images
- Reducing the frame rate to 10fps
- Adding fast start flags to the output file to make streaming easier and ensure compatibility with different devices.
My final ffmpeg command is:
ffmpeg -i INPUT.mov -pix_fmt yuv420p -c:v libx264 -crf 18 -preset veryslow -tune stillimage -r 10 -acodec aac -b:v 48k -f mp4 -movflags +faststart OUTPUT.mp4
This resulting files average less than a MB per minute, most of which is audio, but have hardly any artifacts in the video.
Originally published at https://lou.dev.