Audio File Formats for Phone Systems

Find out which file formats are appropriate for your phone system, and what differentiates one audio file format from another. With dozens of manufacturers in existence and new platforms being released every day, there are a multitude of options for business phone systems that support professional IVR and MOH Systems.

Audio Codecs for Telephony

Knowing what type of audio file formats your phone system can support is important. Some of the most prevalent file container formats in the telephony ecosystem are .wav, .mp3, .vox and .au. But many platforms only support certain audio encoding methods, and the encoding methods may be distinguished by different formatting standards.


PCM or “Pulse Code Modulation”, is a very common digital audio compression method, particularly commonplace in telephony environments. A PCM audio stream can have varying levels of audio quality and file size by manipulating two properties: the “bit depth”, and the “sampling” rate, which is the number of times per second that a sample of the audio signal is taken. .Wav files using this format will commonly be referred to by their bit depth, then sampling rate and encoding method – the most common PCM format for phone systems being a 16 bit, 8 kHz .wav file. 


Adaptive Differential Pulse Code Modulation (ADPCM) is an audio encoding method that adapts the size of the compression sampling, which results in a smaller file size with acceptable quality in many types of call environments. Audio transmissions that are ‘noisy’ or contain more complex sounds may perform worse using this encoding method. A standard for ADPCM files was developed by Interactive Media Association in the form of the IMA ADPCM format. IMA ADPCM(.wav)  8 kHz, 4 bit would be the most commonly seen version of this file type. Occasionally, other containers such as .vox might be used with this encoder.

Dialogic ADPCM

Sharing many attributes with ADPCM .wav formats, DADPCM (or “Dialogic”) VOX breaks an audio stream into 4-bit samples for compression. The sample rate is commonly set at 8 kHz, but may occasionally be seen at 6 kHz as a requirement on some phone systems. The container for these files is .vox – VOX standards may differ from WAV in that there are now file headers to signify file specifications, meaning that the playback environment will often be hard-coded to a single specification (8 kHz vs 6 kHz).

G.711 u-law

A Consultative Committee for International Telephony specified standard for PCM μ-law encoded .wav files is G.711. This is an extremely common file format for phone systems, due to its extensive documentation and extremely low bandwidth usage. The initial standard for this compression method was set in 1972, and it remains overwhelmingly popular to present day. Frequency response is in the 300-3400 Hz range, and the sample rate is at 8 kHz. G.711 files are commonly referred to as 8 kHz, 8 bit, u-law .wav, or as CCIT, 64 kbps, u-law .wav.


MP3 (or “MPEG-2”) is an extremely common lossy audio encoding format, predominantly used for music or video streaming applications. Compared with dedicated telephony formats, MP3 is much higher quality, having a wider frequency spectrum than necessary for pure human speech use cases. There are many MP3-encoding options available, that can encode in either constant bit rate or variable bit rate modes at a wide range of possible specifications. MP3 is typically overly bandwidth-intensive for phone applications – a very large number of manufacturers and services will accept MP3 files, but they are normally converted to a lower bandwidth format like u-law automatically. The most commonly seen specifications for MP3 would be a 128 kbps, CBR .mp3 file.


G.729 is a proprietary audio encoding method described as “code excited linear prediction speech encoding”. It is said to compress an audio stream into byte packets 10 milliseconds in duration at a time. Its main selling point is greatly reduced bandwidth overhead for higher quality playback compared to other audio stream methods. It is popular on several VoIP platforms where there are extremely high numbers of simultaneous users. G.729 files cannot be read by most consumer audio playback systems. Phone systems that permit use of G.729 files may require an extra licensing fee for this feature to be enabled.


G.722 is an International Telecommunication Standardization Sector format intended as one of the primary “wideband” encoders for telephony applications. Wideband formats are meant to capture a larger range of human voice tones, creating a new generation of “hi-def”  telephony speech transmission. The original G.722 specification used an encoding method referred to as “sub-band ADPCM” which has higher complexity when compared to base ADPCM encoders.

G.722.1 (Siren 7)

G.722.1 is an extension to the original G.722 standard, using the “Siren 7” encoder which is maintained by what is now Polycom. This implementation of the wideband standard format is more suitable to bandwidth-limited environments than the original, and it supports bit rates at 16, 24, and 32 kbps. Polycom now offers licenses to this standard royalty free.


Opus is a wideband-adaptive audio format developed by the Xiph Foundation and adopted as a standard by the Internet Engineering Task Force. It is optimized to handle speech and general audio in a single format, and it can be compressed for low latency environments efficiently. This is a format that may see much wider adoption in the future, due to its favorable comparison with MP3 format, wideband formats such as G.722, and traditional low latency / low bandwidth formats such as G.711. At present, it may be seen primarily in web VoIP applications and some wideband VoIP applications. Opus is an open standard format.