The H.264 Advanced Video Coding standard specifies two types of entrop coding: Context-based Adaptive Binary Arithmetic Coding (CABAC) and Variable-Length Coding (VLC). This document provides a short introduction to CABAC. Familiarity with the concept of Arithmetic Coding is assumed.
2 Context-based adaptive binary arithmetic coding (CABAC)
In an H.264 codec, when entropy_coding_mode is set to 1, an arithmetic coding system is used to encode and decode H.264 syntax elements. The arithmetic coding scheme selected for H.264, Context-based Adaptive Binary Arithmetic Coding or CABAC, achieves good compression performance through (a) selecting probability models for each syntax element according to the element’s context, (b) adapting probability estimates based on local statistics and (c) using arithmetic coding.
Coding a data symbol involves the following stages.
- Binarization: CABAC uses Binary Arithmetic Coding which means that only binary decisions (1 or 0) are encoded. A non-binary-valued symbol (e.g. a transform coeﬃcient or motion vector) is “binarized” or converted into a binary code prior to arithmetic coding. This process is similar to the process of converting a data symbol into a variable length code but the binary code is further encoded (by the arithmetic coder) prior to transmission.
Stages 2, 3 and 4 are repeated for each bit (or “bin”) of the binarized symbol.
- Context model selection: A “context model” is a probability model for one or more bins of the binarized symbol. This model may be chosen from a selection of available models depending on the statistics of recently-coded data symbols. The context model stores the probability of each bin being “1” or “0”.
- Arithmetic encoding: An arithmetic coder encodes each bin according to the selected probability model. Note that there are just two sub-ranges for each bin (corresponding to “0” and “1”).
- Probability update: The selected context model is updated based on the actual coded value (e.g. if the bin value was “1”, the frequency count of “1”s is increased).
3 The coding process
We will illustrate the coding process for one example, MVDx (motion vector diﬀerence in the x-direction).
- Binarize the value MVDx . Binarization is carried out according to the following table for |MVDx|<9 (larger values of MVDx are binarized using an Exp-Golomb codeword).
(Note that each of these binarized codewords are uniquely decodeable).
The first bit of the binarized codeword is bin 1; the second bit is bin 2; and so on.
- Choose a context model for each bin. One of 3 models is selected for bin 1, based on previous coded MVD values. The L1 norm of two previously-coded values, ek, is calculated:
- Encode each bin. The selected context model supplies two probability estimates: the probability that the bin contains “1” and the probability that the bin contains “0”. These estimates determine the two sub- ranges that the arithmetic coder uses to encode the bin.
- Update the context models. For example, if context model 2 was selected for bin 1 and the value of bin 1 was “0”, the frequency count of “0”s is incremented. This means that the next time this model is selected, the probability of an “0” will be slightly higher. When the total number of occurrences of a model exceeds a threshold value, the frequency counts for “0” and “1” will be scaled down, which in eﬀect gives higher priority to recent observations.
4 The context models
Context models and binarization schemes for each syntax element are defined in the standard. There are a total of 267 separate context models, 0 to 266 (as of September 2002) for the various syntax elements. Some models have diﬀerent uses depending on the slice type: for example, skipped macroblocks are not permitted in an I-slice and so context models 0-2 are used to code bins of mb_skip or mb_type depending on whether the current slice is Intra coded.
At the beginning of each coded slice, the context models are initialised depending on the initial value of the Quantization Parameter QP (since this has a significant eﬀect on the probability of occurrence of the various data symbols).
5 The arithmetic coding engine
The arithmetic decoder is described in some detail in the Standard. It has three distinct properties:
- Probability estimation is performed by a transition process between 64 separate probability states for “Least Probable Symbol” (LPS, the least probable of the two binary decisions “0” or “1”).
- The range R representing the current state of the arithmetic coder is quantized to a small range of pre- set values before calculating the new range at each step, making it possible to calculate the new range using a look-up table (i.e. multiplication-free).
- A simplified encoding and decoding process is defined for data symbols with a near-uniform probability distribution.
The definition of the decoding process is designed to facilitate low-complexity implementations of arithmetic encoding and decoding. Overall, CABAC provides improved coding eﬃciency compared with VLC at the expense of greater computational complexity.
Iain E Richardson, “The H.264 Advanced Video Compression Standard”, John Wiley & Sons, 2010.
About the author
Vcodex is led by Professor Iain Richardson, an internationally known expert on the MPEG and H.264 video compression standards. Based in Aberdeen, Scotland, he frequently travels to the US and Europe.
Professor Richardson is the author of “The H.264 Advanced Video Compression Standard”, a widely cited work in the research literature. He has written three further books and over 50 journal and conference papers on image and video compression. He regularly advises companies on video codec technology, video coding patents and mergers/acquisitions in the video coding industry. Professor Richardson leads an internationally renowned image and video coding research team, contributes to the MPEG industry standards group and is sought after as an expert witness and litigation consultant.