Music & Audio

Audio GUI Design

Designing a proper user interface for an audio application is not entirely straight forward. Here is a check list that should help you audio developers avoid some of the most common pitfalls. This page is based on personal observations and is mostly based on comparing various audio software with expensive studio hardware, which is usually very well tested and thus correctly designed.

Frequency selections and timings should be logarithmic

Presenting the user to a linear frequency selection is pretty much always wrong. Equalizers, filters, FFT displays etc. should all be logarithmic so that each octave has the same graphical size.

Also, in most cases where you want to use to choose something time related, a logarithmic slider is often the better choice. Here are some examples of where a logarithmic input is desireable:

Frequency sliders (e.g. equalizers, cutoff frequency, pitch frequency)
Compressor/expander attack and release times
Compressor/expander ratio
The bandwitch on an equalizer
Attack time, Decay time, Release time on a synthesizer
The speed of an LFO

Volume sliders are neither log or linear

That's right. Linear volume sliders are poor because the range from "OFF" to "medium" is squeezed together in a very small area, making it hard to control low volumes. Logarithmic is even worse, because the area from "medium" to "loud" is now squeezed together too much. Also, logarithmic volume sliders/faders cannot entirely shut off the signal - comparable to a kitchen sink where the tap cannot be turned off entirely, so that it's always dripping a little bit. The answer is somewhat surprising as the proper formula is:

Amplitude = SliderPosition³ * 3.162278

Where "Amplitude" is the multiplication factor and "SliderPosition" is the graphical slider position from 0 ... 1. This weird formula seems to be the best match to how the human ear perceives loudness.

The * 3.162278 part allows for up to +10 dB boost, and can be left out if you don't want to allow boosting the signal. If you keep it, the 0 dB point will be at 0.681292.

This formula applies for all volume related situations such as:

Volume sliders and knobs on a mixer
Volume buttons on synthesizers and amplifiers
Compressor/gate threshold level buttons
Gate threshold level buttons
The various volumes and the sustain level on a synthesizer

There is an exception:

Volume adjustments that offer a limited range of boost/attenuation (a relative volume change measured in dB), such as a -12 ... +12 dB range or similar, should rather appear linear on the dB scale, so that each dB has the same size visually.

Volume sliders should offer boosting the signal

Some applications only offer a range of -inf to 0 dB. This is poor as it does not allow the user to boost the signal, which might be necessary depending on the purpose of the signal and the context. Always offer between 10 and 20 dB of boost. You can download the source for a QT application here too see and try an example of a good volume control.

Pan should have 3 dB compensation

When panning far left or right, the signal in that side should be 3 dB louder than if panned center. This is to compensate for the loss of over-all signal energy. You can use the sqr() square root function to achieve the correct curve.

Meters and read-outs

Linearity

I propose that volume level meters should also follow the x³ formula described above, so that signals down to -inf can be displayed, unlike logarithmic meters. Frequency-oriented meters such as a graphical FFT views should have a logarithmic frequency scale if used for music, so this should be the default setting. Only hardware designers sometimes need the linear scale.

When designing FFT views, also keep in mind that the lower frequency bands sum up more semitones in each band than the high frequency bands, so you must offer a display mode that counter-acts this, so that pink noise shows up as a horizontal graph. This is by far the most useful mode for music production.

Headroom

How much headroom you display should be carefully considered. In digital systems 0 dB is often the maximum possible output level on the soundcard or output medium (Audio CD/mp3 etc.) In these cases you just want to display if clipping occurred or not.

On the other hand you typically use floating point inside music software, allowing audio above 0 dB (just like analog equipment.) In these cases it would often be wise to display something like 10-12 dB of extra headroom. A red/yellow warning color could be used to indicate levels above 0 dB.

Peak hold

I recommend using a fairly fast update interval like e.g. 30 or 60 times per second, as the eye catches peaks very well - especially if the sound indicator is brighter than the background color of the meter. For this reason I also recommend a dark background inside the meter. Too slow meters do not reveal the level variations well enough.

A peak hold feature (drawing a steady spot at the highest peak detected within the last 500ms) is also recommended. This is especially important if people have slow display devices like TFT screens. Offering some kind of indication of the RMS level overlayed on top would further improve things. The RMS level does not need a peak hold option, as it moves more slowly.

If there is sufficient screen space, displaying the peak value in dB is also a good idea. In case of a peak (above 0 dB) it is easy to see how much the signal was too loud.

Display an FFT power spectrum

There are two things to take into consideration when displaying audio seperated into discrete frequency bands, or as a frequency curve:

The frequency scale must be logarithmic by default (as mentionened above).
The meter should be weighted after pink noise by default (and not white noise).

Due to the nature of the FFT transformation, each frequency band cover more and more of an octave towards the lower frequencies. And because each frequency band sums up all the sound within the range it covers, you must counter-adjust by 1/<frequency band>. This way pink noise is displayed as a horizontal line. This is what the mind perceives as being "linear".

Make the meter as fast as possible. In a frequency view, it is most important to know the peak levels of each frequence band. Once again a peak hold feature is highly recommended, and should probably hold peaks for several seconds, because of the complex nature of the displayed information.

A common mistake

When producing music, the pink noise weighted display is far easier to use, because a properly balanced signal (like music) will tend to have it's peaks at fairly horizontal line. Most applications today are weighted after white noise. This is more useful to programmers and scientists, as a normal signal like music would always seem to have "more bass than treble" and show up as a tilted line - and it is rather difficult to see if a line is "not tilted enough" or "too tilted".

Want to know more?

If you liked the above, you may also find my general design guidelines interesting.

Code snippets

I won't try to teach math or programming, but as a kind of FAQ i will demonstrate a few code examples in something that resembles C++ a bit:

Converting a slider value into a frequency:

startfrequency = 20; endfrequency = 20000; sliderrange = 127; a = log(startfrequency); b = sliderrange/(log(endfrequency)-a); for (in=0; in<=sliderrange; in++) { out = exp(a + in/b); printf("%03d --> %5.0f Hz\n", in, out); }

Javascript Example

Website by Joachim Michaelis