ott Concepts – Transcoding

If a file format is not supported by a system, transcoding comes into the picture by helping you out in such a situation. Transcoding is all about conversion of file format from one analog-to-analog file or digital-to-digital file to another file format. While if we discuss about video transcoding i.e. process of converting a video file from one format to another. It is done to make the video content viewable across the different platforms and devices. Also, transcoding is performed when the target device does not support the format that the original data is in. Video transcoding is very useful to convert incompatible and obsolete files types into modern format (supported by new devices). Transcoding has become popular as it is helpful over video sharing websites. transcode the video data into one of the formats that are well supported by the particular website or if needed your own website.

Though, each time file is transcoded, there is loss in quality. that is a reason why transcoding is often termed as a loss process. If the input compression is lossless and the output is uncompressed or lossless compressed, transcoding can be lossless.

The other important use of Transcoding is to fit HTML and Graphic files to the unique Mobile Device constraint and also to other Web-enabled products. This became constraint as these devices usually have smaller screen sizes, low memory and slow bandwidth rates. Here, transcoding is performed by a transcoding proxy server or device, which receives the requested document or file and uses a specified annotation to adapt it to the client.

What is Transcoding?

Digital storage of any media requires conversion from analog to a digital format. The initial digital format after media production is still a raw file, and for it to be stored and accessed across different devices it needs to be compressed in the particular digital format compatible across the device. This form of video compression for files to make them compatible with a target device is called encoding.

The terms encoding and transcoding are sometimes used interchangeably, inspite of the use-case difference between the two.

Generally speaking, encoding refers to the process of converting uncompressed data to the desired format. This is understood to be a lossy process. On the other hand, transcoding is the process of decoding a video file from one format to an uncompressed format and then encoding the uncompressed data to the desired format. Video transcoding is commonly used when the video file is being moved from a source to a different destination, and when the two support different file formats.

One of the most important uses of video transcoding is in uploading video from one source – a desktop, to an online video hosting site, so that the format is supported by video hosting site.

Some other terminology in use regarding video transcoding and encoding are:

  • Transmuxing – Conversion to a different container format without changing the file itself
  • Transrating – Conversion to a different bitrate using the same file format

What is a Codec?

A video codec is any device/software that compresses a video file. A device/software that only compresses an analog file is called an encoder, whereas a device/software that only decompresses a compressed digital file to analog is a decoder. The term ‘codec’ comes out of the concatenation of the two terms encoding and decoding.

How do codecs work?

For any codec to work it needs to compress the frames. There are two types of frame compressions – inter-frame and intra-frame compression. In intra-frame compression, each frame is compressed independently of the adjacent frames. It is therefore essentially image-compression applied to video.

Inter-frame compression on the other hand identifies redundancies across frames to compress videos. This is includes any elements of the moving image that may be static – say a static background in a talking head video. Inter-frame compression is much more efficient than intra-frame compression, and so most codecs are optimized to identify redundancies across frames.

Transcoding on cloud server for secure video hosting

What are some of the most prominent codecs?

MPEG (Motion Picture Experts Group) is the most common family of video codecs. International Standards Organization (ISO), whose standards impact the computer and consumer electronics markets has had at the following list of codecs as its standards:

  • MPEG-1 in 1993
  • MPEG-2 in 1994
  • MPEG-4 in 1999
  • AVC/H.264 in 2002

H.264 is the most prevalent compression standard in use currently. For a period in the late 90s and early 2000s, the rivalry between RealNetworks and Microsoft was around creating their own proprietary formats as the standards for codecs. RealNetwork’s RealVideo, Microsoft’s Windows Media Video, ON2’s VP6 and Sorenson Video 3 were the dominant proprietary codecs. The H.264 codec was added to Apple’s QuickTime 7 in 2005, and in 2007 added H.264 support for Flash.

VP9 is the proprietary video compression format and codec developed by ON2 Technologies, which were acquired by Google in 2010. VP9 is available under new BSD License by Google with source code available as libvpx VP9 codec SDK. 

Theora is a free lossy video compression format, distributed without licensing fees. Theora is derived from VP3 codec, which has been released into the public domain. Ogg Video Container format uses Theora as its compression format.

How is a media file format distinct from a codec?

A file format is a container, inside which is the data that has been compressed by a video codec. A single file format may support multiple video codecs. For example the Audio Video Interleaved (AVI) file format contains data that compressed from any of a range of codecs.

What are some of the most prominent containers?

QuickTime File Format is a multimedia container format used by Apple’s multimedia framework.

MP4 is the most popular container format used for storing digital audio and video. Both QuickTime and MP4 file formats use the same MPEG-4 format.

FLV is the file container format used for video content using Adobe Flash Player. Flash video content can be embedded within SWF files.

WebM is the royalty-free container format, sponsored by Google. WebM uses the VP8 and VP9 codecs as compression formats.

Ogg container format is maintained by Xiph.Org foundation, and is not restricted by software patents used in H.264. Ogg is supported by the Wikipedia community.

Advanced Systems Format (ASF) is Microsoft’s proprietary video container format designed for streaming media.

The big difference between video transcoding and video encoding

Encoding deals with converting uncompressed data to a specific format or codec and is a lossy process. While transcoding at a higher level is taking encoded or already compressed media content and decoding (decompressing) it to an uncompressed format, and then altering the content in any way possible and recompressing it. For example, you are adding watermarks, graphics or logos into the video. Or in other words re-encoding an existing video (or ongoing stream) file from one digital encoding format to another with a different codec or settings and involves translation of all three elements of a video file.

Video files are large and contain a lot of data, consuming a lot of storage and bandwidth during transmission, while some files also have limitations in terms of compatibility issues with the ever-growing number of appliances consuming video content. To address these problems, there’s a technology known as “codecs” to compress video files, removing extraneous data from the video to reduce the size (i.e. from WMA to MP4) and resolving compatibility concerns while maintaining the high-quality content.

The burgeoning variety of technologies day by day with different generations of equipment, low to high-speed networks and over-the-top (OTT) services are creating different demands for video formats and qualities to maintain interoperability across the plethora of devices, making sure of a high-quality experience for the end-user. Transcoders offer audio conversion, packaging and metadata transfer, caption conversion, etc. enabling the provider to provide several audio formats along with multiple video formats, i.e. H.264, MPEG-4.

Most of the OTT service providers, like Netflix, employ real-time transcoders and whenever a request is made to view a video the transcoders process the request and transcode video depending on the capability and type of requesting user’s device. The transcoder can repackage into adaptive bitrate like Flash HDS, MSS, HTTP Live Streaming (HLS) and the client device can select the optimal stream depending on bandwidth available.

Transcoding is widely an umbrella term that covers multiple digital media tasks like transmuxing, trans-sizing and trans-rating.

  • Trans-rating – refers to changing bitrate, using the same file format and converting to a different bitrate. Like taking a 4K video stream and converting it into one or several lower bitrate streams, this is also known as rendition.
  • Trans-sizing – refers to resizing the video frame, e.g. from a resolution 4K UHD of 3840 pixels x 2160 below to 1920×1080 or 1080p.

So, while referring to transcoding, you might be indicating any combination of these activities. Another essential thing, video conversion (encoding) is a computationally intensive task and requires powerhouse hardware resources equipped with graphics acceleration capabilities.

What Transcoding Is Not

Transcoding should not be mixed with transmuxing, as it refers to repackaging, rewrapping, or packetizing – conversion to a different container format without changing the file itself. It is like when you take compressed video and audio and without altering the actual content – repackage it into distinctive delivery format. For example, changing the h.264/AAC content container to send it as HTTP Live Streaming (HLS), HTTP Dynamic Streaming (HDS), etc. The content remains the same, unaltered so the computational overhead is much smaller than for transcoding.

Codec: Encoding and Decoding

The term Codec comes from the nexus of the two terms encoding and decoding. Therefore, a codec is any software/device that compresses and/or decompresses a compressed digital file. The device/software that only compresses an analog file is known as encoder, while the device/software that only decompresses is known as decoder.

The way codec works is it needs to compress frames. There are two ways to frame compression – inter-frame and intra-frame compression.

Inter-frame compression identifies redundancies across frames to compress the videos. While the intra-frame compression is essentially image compression as it compresses each frame independently. That’s why inter-frame compression is more efficient and used by most codecs.

3 Big Reasons to Trans-code Video Files

Bridging the gap – Creating Multiple Video Formats

With diverse proprietary file formats and codec supports, the need for media exchange between a plethora of systems is clear. Transcoding enables to you to re-encode a video stream into multiple formats like MPEG or HLS – to offer streaming to a range of appliances that only support certain formats. Production, post-production, distribution, and archiving are all distinct and operate on their own standards, requirements, and practices. Transcoding bridges the gaps among them, enables media exchange between disparate systems and makes the vast range of uses to which digital media are now possible.

Boosting QoE (Quality of Experience)

Live streaming and media service providers like Netflix are the top tier users of transcoding allowing them to serve their user base much more fittingly and efficiently. With video transcoding, broadcasters are able to accommodate the bitrate of your video stream based on certain factors i.e. the device your using, bandwidth available, and the codec supported. For example, using HLS allows dynamic switching between video sources i.e. 1080p and 720p versions of stream, depending on the network speed and device.

If a user regularly experiences lagging, buffering, slow video startup or frequent failures to play the content altogether, the QoE goes down and with that the customer’s video quality. Maximizing the QoE is crucial for any media broadcaster and transcoding can help.

Reducing customer storage

The source files on which transcoding is performed are much larger than the asset files produced after the tweaking and brushing are done. It takes the burden off the user system storage.

Custom requirements

Video content creators might have their specialized design and implementation requirements. Such as special formats, multi-lingual audio streams, clipping and trimming, etc. to be applied on the video stream before delivering to the user system.

Numerous factors go into video transcoding and makes an essential part of getting your content ready to be delivered for best user experience and efficacy.

Advertisement

Jenkins Job -[mp4 video (1080p) – transcode to 480p – convert into HLS]

Production-Ready Multi Bitrate HLS VOD stream creation.

HLS is one of the most prominent video streaming formats on desktop and mobile browsers. Since end-users have different screen sizes and different network performance, we want to create multiple renditions of the video with different resolutions and bitrates that can be switched seamlessly, this concept is called MBR (Multi Bit Rate).
For this task, we will use ffmpeg, a powerful tool that supports the conversion of various video formats from one to another, including HLS both as input and output.

In this guide will show a real-world use of ffmpeg to create an MBR HLS VOD stream from a static input file.

Installing FFMPEG

ffmpeg is a cross-platform program that can run on Windows and OS X as well as Linux.

Windows

  • Download latest version from here
  • Unzip the archive to a folder
  • Open a command prompt in the unzipped folder
  • Run ./ffmpeg – you should see FFmpeg version and build information

OS X

  • Install homebrew
  • Run brew install ffmpeg (extra options can be seen by running brew options ffmpeg)
  • Run ./ffmpeg – you should see ffmpeg version and build information

Ubuntu

sudo add-apt-repository ppa:mc3man/trusty-media  
sudo apt-get update  
sudo apt-get install -y ffmpeg

CentOS / Fedora

yum install -y ffmpeg

Latest binaries for all platforms, source code, and more information is available at ffmpeg’s official website

Source Media

I will use a .mkv file beach.mkv, though ffmpeg will consume most of the common video formats. sample files can be downloaded from here and here.

To inspect the file properties run the following command:

ffprobe -hide_banner beach.mkv

The file is identified as mkv, 21 seconds long, overall bitrate 19264 kbps, containing one video stream of 1920×1080 23.98fps in h264 codec, and one AC3 audio stream 48kHz 640 kbps.

Multi Bitrate Conversion

First rendition

Let’s build command for one rendition:

ffmpeg -i beach.mkv -vf scale=w=1280:h=720:force_original_aspect_ratio=decrease -c:a aac -ar 48000 -b:a 128k -c:v h264 -profile:v main -crf 20 -g 48 -keyint_min 48 -sc_threshold 0 -b:v 2500k -maxrate 2675k -bufsize 3750k -hls_time 4 -hls_playlist_type vod -hls_segment_filename beach/720p_%03d.ts beach/720p.m3u8
  • -i beach.mkv – set beach.mkv as input file
  • -vf "scale=w=1280:h=720:force_original_aspect_ratio=decrease" – scale the video to maximum possible within 1280×720 while preserving the aspect ratio
  • -c:a aac -ar 48000 -b:a 128k – set audio codec to AAC with a sampling of 48kHz and bitrate of 128k
  • -c:v h264 – set video codec to be H264 which is the standard codec of HLS segments
  • -profile:v main – set H264 profile to main – this means support in modern devices read more
  • -crf 20 – Constant Rate Factor, high-level factor for overall quality
  • -g 48 -keyint_min 48IMPORTANT create a key frame (I-frame) every 48 frames (~2 seconds) – will later affect correct slicing of segments and alignment of renditions
  • -sc_threshold 0 – don’t create key frames on scene change – only according to -g
  • -b:v 2500k -maxrate 2675k -bufsize 3750k – limit video bitrate, these are rendition specific and depends on your content type – read more
  • -hls_time 4 – segment target duration in seconds – the actual length is constrained by key frames
  • -hls_playlist_type vod – adds the #EXT-X-PLAYLIST-TYPE:VOD tag and keeps all segments in the playlist
  • -hls_segment_filename beach/720p_%03d.ts – explicitly define segments files names
  • beach/720p.m3u8 – path of the playlist file – also tells FFmpeg to output HLS (.m3u8)

This will generate a VOD HLS playlist and segments in beach folder.

Multiple renditions

Each rendition requires its own parameters, though ffmpeg supports multiple inputs and outputs so all the renditions can be generated in parallel with one long command.
it’s very important that besides the resolution and bitrate parameters the commands will be identical so that the renditions will be properly aligned, meaning key frames will be set in the exact same positions to allow smooth switching between them on the fly.

We will create 4 renditions with common resolutions:

  • 1080p 1920×1080 (original)
  • 720p 1280×720
  • 480p 842×480
  • 360p 640×360
ffmpeg -hide_banner -y -i beach.mkv \
  -vf scale=w=640:h=360:force_original_aspect_ratio=decrease -c:a aac -ar 48000 -c:v h264 -profile:v main -crf 20 -sc_threshold 0 -g 48 -keyint_min 48 -hls_time 4 -hls_playlist_type vod  -b:v 800k -maxrate 856k -bufsize 1200k -b:a 96k -hls_segment_filename beach/360p_%03d.ts beach/360p.m3u8 \
  -vf scale=w=842:h=480:force_original_aspect_ratio=decrease -c:a aac -ar 48000 -c:v h264 -profile:v main -crf 20 -sc_threshold 0 -g 48 -keyint_min 48 -hls_time 4 -hls_playlist_type vod -b:v 1400k -maxrate 1498k -bufsize 2100k -b:a 128k -hls_segment_filename beach/480p_%03d.ts beach/480p.m3u8 \
  -vf scale=w=1280:h=720:force_original_aspect_ratio=decrease -c:a aac -ar 48000 -c:v h264 -profile:v main -crf 20 -sc_threshold 0 -g 48 -keyint_min 48 -hls_time 4 -hls_playlist_type vod -b:v 2800k -maxrate 2996k -bufsize 4200k -b:a 128k -hls_segment_filename beach/720p_%03d.ts beach/720p.m3u8 \
  -vf scale=w=1920:h=1080:force_original_aspect_ratio=decrease -c:a aac -ar 48000 -c:v h264 -profile:v main -crf 20 -sc_threshold 0 -g 48 -keyint_min 48 -hls_time 4 -hls_playlist_type vod -b:v 5000k -maxrate 5350k -bufsize 7500k -b:a 192k -hls_segment_filename beach/1080p_%03d.ts beach/1080p.m3u8

Master playlist

The HLS player needs to know that there are multiple renditions of our video, so we create an HLS master playlist to point them and save it along side the other playlists and segments.

playlist.m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1400000,RESOLUTION=842x480
480p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
720p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p.m3u8

Example Conversion Script

Here is an example conversion script create-hls-vod.sh

Running:

bash create-vod-hls.sh beach.mkv

will produce:

    beach/
      |- playlist.m3u8
      |- 360p.m3u8
      |- 360p_001.ts
      |- 360p_002.ts
      |- 480p.m3u8
      |- 480p_001.ts
      |- 480p_002.ts
      |- 720p.m3u8
      |- 720p_001.ts
      |- 720p_002.ts
      |- 1080p.m3u8
      |- 1080p_001.ts
      |- 1080p_002.ts  

FAQ

How to choose the right bitrate

Bitrate is dependant mostly on the resolution and the content type. When setting bitrate too low image pixelization will occur especially in areas where there is rapid movement, when the bitrate is too high the output files might be excessively big without adding value.

To choose the right bitrate one must understand his type of content. Content with high motion such as sports or news events will require higher bitrate to avoid pixelization while low motion content such as music concerts and interviews will suffice lower bitrate without apparent changes to quality.

Here are some good defaults to start from:

Quality Resolution bitrate – low motion bitrate – high motion audio bitrate
240p 426×240 400k 600k 64k
360p 640×360 700k 900k 96k
480p 854×480 1250k 1600k 128k
HD 720p 1280×720 2500k 3200k 128k
HD 720p 60fps 1280×720 3500k 4400k 128k
Full HD 1080p 1920×1080 4500k 5300k 192k
Full HD 1080p 60fps 1920×1080 5800k 7400k 192k
4k 3840×2160 14000k 18200 192k
4k 60fps 3840×2160 23000k 29500k 192k

How do I feed FFmpeg through stdin?

ffmpeg has a special pipe: flag that instructs FFmpeg to consume stdin as media.

cat clip.mp4 | ffmpeg -f mp4 -i pipe: output.avi

What’s the difference between avconv and ffmpeg

avconv is a fork (clone) of ffmpeg that was created by a group of developers of ffmpeg due to project management issues. While both are actively maintained, its recommended to use FFmpeg since it has larger community as explained here

%d bloggers like this: