AV1 Video Transcoding

Banner Image of Shanghai
Image Credit: https://unsplash.com/@yiranding

AV1 is a relatively recent video codec build by the Alliance for Open Media (AOM). It provides a encoder that beats H264 in file size for a given quality, and is significantly less patent encumbered than H265 – H264's successor.

For completeness, AV1 competes quite favourably against H265 in quite a few tests, but notably lacks hardware encoder support for the meantime. Intel's Arc GPUs do have an encoder, so surely AV1 encoders are on the horizon.

One can hope.

Anyway, the point of this post is to cover a short but effective way to encode AV1. My examples are all from pretty high-bitrate H264-encoded video due to my GPU's inbuilt hardware encoder performing pretty well there. The only issue with these encodes is the large amount of disk used by this format. On average, these files are consuming about 330MiB per minute or about 5.5MiB per second.

When encoded to AV1, they consume ~34MiB per minute or ~0.56MiB per second. It goes without saying that that reduction in size makes this far easier to store.

To be clear, the above information is by no means scientific. It's laxly controlled, involves napkin maths, and doesn't compare visual fidelity. For what I'm doing, the quality provided by AV1's CRF-36 is sufficient.

Concept

When I set out to transcode to AV1, I started with SVT-AV1, and what I found was a slowdown in Perceptual Quality mode for very long encodes. The solution I devised was to split the video into pieces and stitch them back together.

This has two benefits that I care about: it keyframes in the good places, and it sidesteps the SVT-AV1.

Or so I thought. The former is true, and the latter may be true, but I moved away from SVT to AOMedia's encoder due to YUV444 support.

To achieve this, we need to go through three steps as explained below.

Generating Scene Transitions

I'm no expert in detecting scene transitions, so I choose to cheap out and leverage PySceneDetect.

mkdir -p "$splitFolder"
scenedetect --input "$1" -o "$splitFolder" detect-content list-scenes -q

It's a pretty short step, so let's keep it sweet. This line creates our temporary folder that we'll use to house the chunked transcodes. You could do this relative, but I have mixed drive types on my server, so I placed this temporary folder on my SSDs.

Generating Chunks

For chunk generation, we're going to use FFmpeg. This script is relatively simple, so I'll explain what's going on in two segments rather than line by line.

Firstly, we go looking for the split times in the csv. It's not an elegant approach, but I don't use bash for tasks like this very often. We need to drop the first 15 characters to get just a list of every split location we want.

Secondly, we do the heavy lifting in FFmpeg. We ask FFmpeg to generate presentation timestamps with -fflags +genpts, create new keyframes exactly on the slices we generated with -force_key_frames:v $timec, and encode the video using AV1 with -c:v libaom-av1 ....

for filename in "$splitFolder"*.csv; do
  echo "$filename"
  timecode=$(head -n 1 "$filename")
done

timec=${timecode:15}

ffmpeg -fflags +genpts -i "$1" -map 0 -force_key_frames:v "$timec" -c:v libaom-av1 -pix_fmt yuv444p10le -r $3 -g $4 -crf 36 -row-mt 1 -b:v 0k -threads 12 -denoise-noise-level 0 -cpu-used 6 -c:a copy -f segment -segment_times "$timec" -segment_format mp4 -reset_timestamps 1 "$splitFolder/chunk_%08d.$intermedFormat"

The rest of the settings are directly related to the encoding process, so let's cover them here.

pix_fmt is used here to specify that we a certain pixel format. In my case, I want to match my input forcefully with 10-bit little-endian yuv444.

r is used to set the framerate, and g is used to set the keyframe frequency. I set these to 30 and 150 respectively for 30FPS content.

crf specifies our quality setting. For the content I'm handling, I've found 36 for 2560x1440 content to be reasonable. If you're doing 1920x1080, you may want to lower this. In the future, when AV1 encoders are common, you can probably lower this even further than decrease cpu-used to keep the file size in check.

row-mt apparently enables multithreading. AOM's encoder doesn't saturate anything I've tested it on, so I'm going by the documentation.

b:v sets the bitrate. Setting it to 0 makes crf's behaviour mirror that of H264.

threads unsurprisingly sets the number of threads the process can use.

denoise-noise-level sets the scale of the denoiser and the noise generation on playback. This is set to 0 due to the considerable performance impact.

cpu-used controls the amount of CPU time used by the encoder per frame. This setting affects the file size and the encode time. Increasing it will increase encode time and should decrease file size. The reverse is also true.

-c:a copy tells ffmpeg to copy the audio track without encoding.

-f segment selects the segment filter.

-segment_times is the second half of the -force_key_frames:v parameter we passed earlier. This parameter does the actual segmentation of the video at the same timestamps as we forced FFmpeg to keyframe. This allows us to stitch these chunks back together relatively easily. The final parameter needs to have a %XXd in the name to generate segments. I've used %08d to avoid incorrect ordering in the next stage as some of my transcodes exceeded %03d.

reset_timestamps resets the base time to 0 on the chunks so our stitch functions correctly.

Stitching It Back Together

FFmpeg will also do the stitching for us. Segment has a equal-and-opposite filter called Concat that we can use to undo the segmentation.

Before we call FFmpeg, however, we need to generate a list of the chunks. Our earlier %08d means Linux will conveniently list them in order, so we just iterate over them and append them to a list.

FFmpeg then gets called with copy codecs for both the audio and video tracks. Note that you can pass flags you want to show up in the final output here, such as I have done with -movflags +faststart.

echo "# Start" > "$splitFolder/list.txt"
for filename in "$splitFolder"*.$intermedFormat; do
  echo "file '$filename'" >> "$splitFolder/list.txt"
done

ffmpeg -f concat -safe 0 -i "$splitFolder/list.txt" -c:v copy -fps_mode:v passthrough -movflags +faststart -c:a copy "$outname"

I should note that you may or may not need -fps_mode:v passthrough in practice, but it's a leftover from when I was debugging something else.