In an H.264/AVC codec, macroblock data are transformed and quantized prior to coding and rescaled and inverse transformed prior to reconstruction and display (Figure 1). Several transforms are specified in the H.264 standard: a 4×4 “core” transform, 4×4 and 2×2 Hadamard transforms and an 8×8 transform (High profiles only).
Figure 1 Transform and quantization in an H.264 codec
This paper describes a derivation of the forward and inverse transform and quantization processes applied to 4×4 blocks of luma and chroma samples in an H.264 codec. The transform is a scaled approximation to a 4×4 Discrete Cosine Transform that can be computed using simple integer arithmetic. A normalisation step is incorporated into forward and inverse quantization operations.
2 The H.264 transform and quantization process
The inverse transform and re-scaling processes, shown in Figure 2, are defined in the H.264/AVC standard. Input data (quantized transform coeﬃcients) are re-scaled (a combination of inverse quantization and normalisation, see later). The re-scaled values are transformed using a “core” inverse transform. In certain cases, an inverse transform is applied to the DC coeﬃcients prior to re-scaling. These processes (or their equivalents) must be implemented in every H.264-compliant decoder. The corresponding forward transform and quantization processes are not standardized but suitable processes can be derived from the inverse transform / rescaling processes (Figure 3).
3 Developing the forward transform and quantization process
The basic 4×4 transform used in H.264 is a scaled approximate Discrete Cosine Transform (DCT). The transform and quantization processes are structured such that computational complexity is minimized. This is achieved by reorganising the processes into a core part and a scaling part.
Consider a block of pixel data that is processed by a two-dimensional Discrete Cosine Transform (DCT) followed by quantization (dividing by a quantization step size, Qstep , then rounding the result) (Figure 4a).
Rearrange the DCT process into a core transform (Cf) and a scaling matrix (Sf) (Figure 4b).
Scale the quantization process by a constant (215) and compensate by dividing and rounding the final result (Figure 4c).
Combine Sf and the quantization process into Mf (Figure 4d), where:
4 Developing the resclaing and inverse transform process
Consider a re-scaling (or “inverse quantization”) operation followed by a two-dimensional inverse DCT (IDCT) (Figure 5a).
Rearrange the IDCT process into a core transform (Ci) and a scaling matrix (Si) (Figure 5b).
Scale the re-scaling process by a constant (26) and compensate by dividing and rounding the final result (Figure 5c).
Combine the re-scaling process and S into Vi (Figure 5d), where:
This approximation is chosen to minimise the complexity of implementing the transform (multiplication by Cf requires only additions and binary shifts) whilst maintaining good compression performance.
The rows of Cf have diﬀerent norms. To restore the orthonormal property of the original matrix A,
denotes element-by-element multiplication (Hadamard-Schur product). Note that the new matrix A1 is orthonormal.
The two-dimensional transform (Equation 3) becomes:
The core inverse transform Ci and the rescaling matrix Vi are defined in the H.264 standard. Hence we now develop Vi and will then derive Mf .
7 Developing Vi
From Equation 2, Vi = Si ⋅ Qstep ⋅ 26
H.264 supports a range of quantization step sizes Qstep . The precise step sizes are not defined in the standard, rather the scaling matrix Vi is specified. Qstep values corresponding to the entries in Vi are shown in the following Table.
The ratio between successive Qstep values is chosen to be
= 1.2246… so that Qstep doubles in size
when QP increases by 6. Any value of Qstep can be derived from the first 6 values in the table (QP0 – QP5) as follows:
Qstep(QP) = Qstep(QP%6) ⋅ 2floor(QP/6)
The values in the matrix Vi depend on Qstep (hence QP) and on the scaling factor matrix Si . These are shown for QP 0 to 5 in the following Table.
Note that there are only three unique values in each matrix Vi . These three values are defined as a table of values v in the H.264 standard, for QP=0 to QP=5 :
|QP||v (r, 0):
Vi positions (0,0),
(0,2), (2,0), (2,2)
|v (r, 1):
Vi positions (1,1),
(1,3), (3,1), (3,3)
|v (r, 2):
Hence for QP values from 0 to 5, Vi is obtained as:
Denote this as:
Vi = v(QP, n)
Where v (r,n) is row r, column n of v.
For larger values of QP (QP>5), index the row of array v by QP%6 and then multiply by 2floor(QP/6) . In general:
Vi = v (QP%6,n)⋅ 2floor(QP/6)
The complete inverse transform and scaling process (for 4×4 blocks in macroblocks excluding 16×16-Intra mode) becomes:
(Note: rounded division by 26 can be carried out by adding an oﬀset and right-shifting by 6 bit positions).
8 Deriving Mf
Combining Equation 1 and Equation 2:
Denote this as:
Mf = m(QP, n)
Where m (r,n) is row r, column n of m.
For larger values of QP (QP>5), index the row of array m by QP%6 and then divide by 2floor(QP/6) . In general:
Mf = m (QP%6,n)/ 2floor(QP/6)
Where m (r,n) is row r, column n of m.
The complete forward transform, scaling and quantization process (for 4×4 blocks and for modes excluding 16×16-Intra) becomes:
(Note: rounded division by 215 may be carried out by adding an oﬀset and right-shifting by 15 bit positions).
ITU-T Recommendation H.264, Advanced Video Coding for Generic Audio-Visual Services, November 2007.
- Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity transform and quantization in H. 264/AVC, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 598–603, July 2003.
- Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, No. 7. (2003), pp. 560-576.
- Richardson, The H.264 Advanced Video Compression Standard, John Wiley & Sons, May 2010.
Further resources on H.264 and video compression here.
I would like to thank Gary Sullivan for suggesting a treatment of the H.264 transform and quantization processes along these lines and for his helpful comments on earlier drafts of this document.
About the author
Vcodex is led by Professor Iain Richardson, an internationally known expert on the MPEG and H.264 video compression standards. Based in Aberdeen, Scotland, he frequently travels to the US and Europe.
Professor Richardson is the author of “The H.264 Advanced Video Compression Standard”, a widely cited work in the research literature. He has written three further books and over 50 journal and conference papers on image and video compression. He regularly advises companies on video codec technology, video coding patents and mergers/acquisitions in the video coding industry. Professor Richardson leads an internationally renowned image and video coding research team, contributes to the MPEG industry standards group and is sought after as an expert witness and litigation consultant.