AV1 Video Transcoding
AV1 is a relatively recent video codec build by the Alliance for Open Media (AOM). It provides a encoder that beats H264 in file size for a given quality, and is significantly less patent encumbered than H265 – H264's successor.
For completeness, AV1 competes quite favourably against H265 in quite a few tests, but notably lacks hardware encoder support for the meantime. Intel's Arc GPUs do have an encoder, so surely AV1 encoders are on the horizon.
One can hope.
Anyway, the point of this post is to cover a short but effective way to encode AV1. My examples are all from pretty high-bitrate H264-encoded video due to my GPU's inbuilt hardware encoder performing pretty well there. The only issue with these encodes is the large amount of disk used by this format. On average, these files are consuming about 330MiB per minute or about 5.5MiB per second.
When encoded to AV1, they consume ~34MiB per minute or ~0.56MiB per second. It goes without saying that that reduction in size makes this far easier to store.
Concept
When I set out to transcode to AV1, I started with SVT-AV1, and what I found was a slowdown in Perceptual Quality mode for very long encodes. The solution I devised was to split the video into pieces and stitch them back together.
This has two benefits that I care about: it keyframes in the good places, and it sidesteps the SVT-AV1.
Or so I thought. The former is true, and the latter may be true, but I moved away from SVT to AOMedia's encoder due to YUV444 support.
To achieve this, we need to go through three steps as explained below.
Generating Scene Transitions
I'm no expert in detecting scene transitions, so I choose to cheap out and leverage PySceneDetect
.
mkdir -p "$splitFolder"
scenedetect --input "$1" -o "$splitFolder" detect-content list-scenes -q
It's a pretty short step, so let's keep it sweet. This line creates our temporary folder that we'll use to house the chunked transcodes. You could do this relative, but I have mixed drive types on my server, so I placed this temporary folder on my SSDs.
Generating Chunks
For chunk generation, we're going to use FFmpeg. This script is relatively simple, so I'll explain what's going on in two segments rather than line by line.
Firstly, we go looking for the split times in the csv. It's not an elegant approach, but I don't use bash for tasks like this very often. We need to drop the first 15 characters to get just a list of every split location we want.
Secondly, we do the heavy lifting in FFmpeg. We ask FFmpeg to generate presentation timestamps with -fflags +genpts
, create new keyframes exactly on the slices we generated with -force_key_frames:v $timec
, and encode the video using AV1 with -c:v libaom-av1 ...
.
for filename in "$splitFolder"*.csv; do
echo "$filename"
timecode=$(head -n 1 "$filename")
done
timec=${timecode:15}
ffmpeg -fflags +genpts -i "$1" -map 0 -force_key_frames:v "$timec" -c:v libaom-av1 -pix_fmt yuv444p10le -r $3 -g $4 -crf 36 -row-mt 1 -b:v 0k -threads 12 -denoise-noise-level 0 -cpu-used 6 -c:a copy -f segment -segment_times "$timec" -segment_format mp4 -reset_timestamps 1 "$splitFolder/chunk_%08d.$intermedFormat"
The rest of the settings are directly related to the encoding process, so let's cover them here.
pix_fmt
is used here to specify that we a certain pixel format. In my case, I want to match my input forcefully with 10-bit little-endian yuv444.
r
is used to set the framerate, and g
is used to set the keyframe frequency. I set these to 30 and 150 respectively for 30FPS content.
crf
specifies our quality setting. For the content I'm handling, I've found 36 for 2560x1440 content to be reasonable. If you're doing 1920x1080, you may want to lower this. In the future, when AV1 encoders are common, you can probably lower this even further than decrease cpu-used to keep the file size in check.
row-mt
apparently enables multithreading. AOM's encoder doesn't saturate anything I've tested it on, so I'm going by the documentation.
b:v
sets the bitrate. Setting it to 0 makes crf
's behaviour mirror that of H264.
threads
unsurprisingly sets the number of threads the process can use.
denoise-noise-level
sets the scale of the denoiser and the noise generation on playback. This is set to 0 due to the considerable performance impact.
cpu-used
controls the amount of CPU time used by the encoder per frame. This setting affects the file size and the encode time. Increasing it will increase encode time and should decrease file size. The reverse is also true.
-c:a copy
tells ffmpeg to copy the audio track without encoding.
-f segment
selects the segment filter.
-segment_times
is the second half of the -force_key_frames:v
parameter we passed earlier. This parameter does the actual segmentation of the video at the same timestamps as we forced FFmpeg to keyframe. This allows us to stitch these chunks back together relatively easily. The final parameter needs to have a %XXd
in the name to generate segments. I've used %08d
to avoid incorrect ordering in the next stage as some of my transcodes exceeded %03d
.
reset_timestamps
resets the base time to 0 on the chunks so our stitch functions correctly.
Stitching It Back Together
FFmpeg will also do the stitching for us. Segment has a equal-and-opposite filter called Concat that we can use to undo the segmentation.
Before we call FFmpeg, however, we need to generate a list of the chunks. Our earlier %08d
means Linux will conveniently list them in order, so we just iterate over them and append them to a list.
FFmpeg then gets called with copy codecs for both the audio and video tracks. Note that you can pass flags you want to show up in the final output here, such as I have done with -movflags +faststart
.
echo "# Start" > "$splitFolder/list.txt"
for filename in "$splitFolder"*.$intermedFormat; do
echo "file '$filename'" >> "$splitFolder/list.txt"
done
ffmpeg -f concat -safe 0 -i "$splitFolder/list.txt" -c:v copy -fps_mode:v passthrough -movflags +faststart -c:a copy "$outname"
I should note that you may or may not need -fps_mode:v passthrough
in practice, but it's a leftover from when I was debugging something else.