IBC2022 Tech Papers: Towards an AI-enhanced video coding standard

Towards an AI-enhanced video coding standard

IBC2022: This Technical Paper describes the ongoing activities of the EVC project of the MPAI.

Abstract

This paper describes the ongoing activities of the Enhanced Video coding (EVC) project of the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI). The project investigates how the performances of existing codecs can be improved by enhancing or replacing specific encoding tools with AI-based counterparts. The MPEG EVC codec baseline profile has been chosen as reference as it relies on encoding tools that are at least 20 years mature yet has compression efficiency close to HEVC. A framework has been developed to interface the encoder/decoder with neural networks, independently from the specific learning toolkit, simplifying experimentation. So far, the EVC project has investigated the intra prediction and the super resolution coding tools. The standard intra prediction modes have been integrated by a learnable predictor: experiments in standard test conditions show rate reductions for intra coded frames in excess of 4% over the reference. The use of super resolution, a state-of-the-art deep-learning approach named Densely Residual Laplacian Network (DRLN), at the decoder side have been found to provide further gains, over the reference, in the order of 3% inthe SD to HD context.

Introduction

MPAI is an international, unaffiliated, non-profit standards developing organisation that has the mission to develop Artificial Intelligence (AI) enabled data coding standards. Its standard development process corrects other standardisation bodies’ shortcomings, by adding a clear Intellectual Property Rights (IPR) licensing framework. MPAI has already developed the AI Framework (AIF) standard (MPAI-AIF), specifying AIF as an environment capable of managing the life cycle of AI Workflows (AIW) and their components called AI modules (AIMs). AIWs are defined by their function, i.e. an MPAI-specified Use Case, the syntax and semantics of the input and output data and the AIM topology. Similarly, AIMs are defined by their function (e.g. motion compensation) and the syntax and semantics of the input and output data, but not the AIM internals. By basing its standards on AIMs, implementers of MPAI standards can have a low entry barrier to an open competitive market for their implementations because application implementers can find the AIMs they need on the open market. The MPAI-AIF standard is currently being extended by adding the capability to access trusted services.

Since the day MPAI was announced, there has been considerable interest in the application of AI to video. Video contents nowadays accounts for more than 70% of Internet traffic volume, hence the interest into efficient video coding technologies able to cope with tomorrow bandwidth-demanding video services (4K video, immersive contents, etc).

Existing video coding standards used in Internet streaming or aerial broadcasting over the air or cable rely on a clever combination of hand-designed encoding tools, each bringing its own contribution to the overall codec performance.

This can be achieved by predicting the picture from neighbouring data within the same picture (known as intra-prediction) or from data previously signalled in other pictures (known as inter-prediction). Intra-prediction uses previously decoded sample values of neighbouring samples to assist in the prediction of current samples.

The residual signal is then transformed via discrete cosine transform, allowing low-pass filtering in the transformed domain. Co-efficient decimation and the subsequent quantisation is the lossy part of the compression process that allows to reduce the high frequency rate while keeping the resulting artefacts bearable to the human observer.

The resulting signal is entropy encoded, which is a lossless form of compression.

Within the encoder, when some sort of prediction is enabled, the encoded signal may be reconstructed through ade-quantisation and inverse transformation step and the input visual data is reconstructed by adding the predicted signal. Filters, such as a deblocking filter and a sample adaptive offset filter are used to improve the visual quality. The reconstructed picture is stored for future reference in a reference picture buffer to allow exploiting the similarities between two pictures.

The motion estimation process evaluates one or more candidate blocks by minimising the distortion compared to the current block. The residual between the current and optimal block is used by the motion compensation, which creates a prediction for the current block. The inter-prediction exploits redundancies between pictures of visual data. Reference pictures are used to reconstruct pictures that are to be displayed, resulting in a reduction in the amount of data required to be transmitted or stored.

However, since resolution and frame rates are increasing at the same time, relying on hardware advances is no longer sufficient for some applications. Over the past years, the research community has investigated the recent developments in Artificial Intelligence (AI) and Machine Learning (ML), to push the boundaries and deliver industry-leading video quality and hardware efficiency. 

There are two main approaches in the AI-based video coding research community: 1) one approach introduces learning based algorithms combined with traditional image video codec, trying to replace one coding block with the AI-based one; 2) an End-to-End (E2E) approach which is mainly focused on replacing the entire chain with a pure deep learning based compression.

Both research directions are being explored within MPAI by the End-to-End Video Coding group, (EEV), and the Enhanced Video Coding group, (EVC), respectively. This document details the recent activities of the EVC group.

The primary goal of MPAI-EVC is to enhance the performance of traditional video codecs by integrating AI-based coding tools. The first step is the MPAI-EVC Evidence Project with the intent to demonstrate that AI tools can improve the MPEG-5 EVC efficiency by at least 25%. Two main tools have beeninvestigated, namely the intra prediction enhancement and the super resolution.The EVC reference schema is depicted in Figure 2. 

A parallel activity to the MPAI-EVC Evidence Project is the MPAI End-to-End Video Coding project (MPAI-EEV) aiming to address the needs of the many who need not only environments where academic knowledge is promoted but also a body that develops common understanding, models and eventually standards-oriented End-to-End video coding solutions. MPAI-EEV can cover the medium-to-long term video coding needs. Currently the group has developed a study of the state of the art of end-to-end video coding and has decided to start from the OpenDVC software to develop a reference model that will be used for collaborative investigations.

The rest of the paper describes in detail the activities of the EVC project with the Intra prediction and Super resolution tools.

Read the full article

ibc365 gated new screenshot v2

Sign up to IBC365 for free

Sign up for FREE access to the latest industry trends, videos, thought leadership articles, executive interviews, behind the scenes exclusives and more!

Already have a login? SIGN IN