Audio GUI Design
Designing a proper user interface for an audio application is not entirely straight forward. Here is a check list that should help you audio developers avoid some of the most common pitfalls. This page is based on personal observations and is mostly based on comparing various audio software with expensive studio hardware, which is usually very well tested and thus correctly designed.
Frequency selections and timings should be logarithmic
Presenting the user to a linear frequency selection is pretty much always wrong. Equalizers, filters, FFT displays etc. should all be logarithmic so that each octave has the same graphical size.
Also, in most cases where you want to use to choose something time related, a logarithmic slider is often the better choice. Here are some examples of where a logarithmic input is desireable:
- Frequency sliders (e.g. equalizers, cutoff frequency, pitch frequency)
- Compressor/expander attack and release times
- Compressor/expander ratio
- The bandwitch on an equalizer
- Attack time, Decay time, Release time on a synthesizer
- The speed of an LFO
Volume sliders are neither log or linear
That's right. Linear volume sliders are poor because the range from "OFF" to "medium" is squeezed together in a very small area, making it hard to control low volumes. Logarithmic is even worse, because the area from "medium" to "loud" is now squeezed together too much. Also, logarithmic volume buttons cannot entirely shut off the signal, which further goes to prove that this is not a proper solution. The answer is somewhat surprising as the proper formula is:
Amplitude = SliderPosition3
Where "Amplitude" is the multiplication factor and "SliderPosition" is the graphical slider position from 0 ... 1. This weird formula seems to be a result of the human ear being designed the way it is.
This formula applies for all volume related situations such as:
- Volume sliders and knobs on a mixer
- Volume buttons on synthesizers and amplifiers
- Compressor/gate threshold level buttons
- Gate threshold level buttons
- The various volumes and the sustain level on a synthesizer
There is an exception:
Volume adjustments that offer a limited range of boost/attenuation (a relative volume change measured in dB), such as a -12 ... +12 dB range or similar, should rather appear linear on the dB scale, so that each dB has the same size visually.
Volume sliders should offer boosting the signal
Some applications only offer a range of -inf to 0 dB. This is poor as it does not allow the user to boost the signal, which might be necessary depending on the purpose of the signal and the context. Always offer between 10 and 20 dB of boost. You can download the source for a QT application here too see and try an example of a good volume control.
Pan should have 3 dB compensation
When panning far left or right, the signal in that side should be 3 dB louder than if panned center. This is to compensate for the loss of over-all signal energy. You can use the sqr() square root function to achieve the correct curve.
Meters and read-outs
I propose that volume level meters should also follow the x3 formula described above, so that signals down to -inf can be displayed, unlike logarithmic meters. Frequency-oriented meters such as a graphical FFT views should have a logarithmic frequency scale if used for music, so this should be the default setting. Only hardware designers sometimes need the linear scale.
When designing FFT views, also keep in mind that the lower frequency bands sum up more semitones in each band than the high frequency bands, so you must offer a display mode that counter-acts this, so that pink noise shows up as a horizontal graph. This is by far the most useful mode for music production.
How much headroom you display should be carefully considered. In digital systems 0 dB is often the maximum possible output level on the soundcard or output medium (Audio CD/mp3 etc.) In these cases you just want to display if clipping occurred or not.
On the other hand you typically use floating point inside music software, allowing audio above 0 dB (just like analog equipment.) In these cases it would often be wise to display something like 10-12 dB of extra headroom. A red/yellow warning color could be used to indicate levels above 0 dB.
I recommend using a fairly fast update interval like e.g. 30 or 60 times per second, as the eye catches peaks very well - especially if the sound indicator is brighter than the background color of the meter. For this reason I also recommend a dark background inside the meter. Too slow meters do not reveal the level variations well enough.
A peak hold feature (drawing a steady spot at the highest peak detected within the last 500ms) is also recommended. This is especially important if people have slow display devices like TFT screens. Offering some kind of indication of the RMS level overlayed on top would further improve things. The RMS level does not need a peak hold option, as it moves more slowly.
If there is sufficient screen space, displaying the peak value in dB is also a good idea. In case of a peak (above 0 dB) it is easy to see how much the signal was too loud.
Display an FFT power spectrum
There are two things to take into consideration when displaying audio seperated into discrete frequency bands, or as a frequency curve:
Due to the nature of the FFT transformation, each frequency band cover more and more of an octave towards the lower frequencies. And because each frequency band sums up all the sound within the range it covers, you must counter-adjust by 1/<frequency band>. This way pink noise is displayed as a horizontal line. This is what the mind perceives as being "linear".
Make the meter as fast as possible. In a frequency view, it is most important to know the peak levels of each frequence band. Once again a peak hold feature is highly recommended, and should probably hold peaks for several seconds, because of the complex nature of the displayed information.
- The frequency scale must be logarithmic by default (as mentionened above).
- The meter should be weighted after pink noise by default (and not white noise).
A common mistake
When producing music, the pink noise weighted display is far easier to use, because a properly balanced signal (like music) will tend to have it's peaks at fairly horizontal line. Most applications today are weighted after white noise. This is more useful to programmers and scientists, as a normal signal like music would always seem to have "more bass than treble" and show up as a tilted line - and it is rather difficult to see if a line is "not tilted enough" or "too tilted".
Want to know more?
If you liked the above, you may also find my general design guidelines interesting.
I won't try to teach math or programming, but as a kind of FAQ i will demonstrate a few code examples in something that resembles C++ a bit:
Converting a slider value into a frequency:
startfrequency = 20;
endfrequency = 20000;
sliderrange = 127;
a = log(startfrequency);
b = sliderrange/(log(endfrequency)-a);
for (in=0; in<=sliderrange; in++)
out = exp(a + in/b);
printf("%03d --> %5.0f Hz\n", in, out);
Website by Joachim Michaelis