Audio-sampling pre-release 4


by Lucian Wischik 1997-1999

There's a compiled example in the 'example' directory. Run it and have a look. It will only do anything if there is some audio input on your system. Right-click on the speaker on the taskbar, Open Volume Controls, Options|Properties and view the options for 'Recording'. Ensure that CD-audio-input is selected. Then put in an audio CD and start it playing using CD-player or somesuch. Or you might want to experiment with a microphone.

Architecture

There is a TRecorder component. This samples sound from the current playback device, such as an audio CD or the microphone or the speaker. You will probably want to ensure that its 'AutoStart' property is set to true, so that it starts sampling as soon as the program executes.

You can have a TWaveView component driven from this. Plonk the TWaveView component on the form (it's the one that has green wavy lines like an oscilloscope). Make its 'source' property point to your TRecorder component. Now try compiling and executing your program. You'll probably want to have CD-Player running so you can have some sound. The wave view should display its green wavy line.

Also, the TFFT non-visual component can be driven from the TRecorder. Plonk a TFFT down somewhere and set its 'Source' property. This component performs a Fast Fourier Transform on the sampled sound to analyze its frequency components.

TEqualizerView is driven by the TFFT. Plonk a TEqualizerView down and set its Source property. This component displays a sort of graphical equalizer bar chart. Run your program, set an audio CD playing, and watch it.

TFreqView is also driven by the TFFT. Plonk a TFreqView down and set its Source property. This component has time on the horizontal axis and frequency on the vertical, and displays the intensity of all frequencies of sound. It's what the police use when they're trying to do vocal fingerprints on criminals. Compile and execute your program again and watch it move. Its output is particularly easy to read when you have saxaphone or clarinet playing, and looks neat when you have a singer like Screamin' Jay Hawkins with a gravelly voice. Violins and other stringed instruments do not have a clean sound output. Drums occupy only a very small frequency range right at the bottom of the music. "We don't need no education" by Pink Floyd looks particularly neat.

How it works

The TRecorder component samples a certain number of bytes of sound into its buffer. Typically you might sample 256 bytes, or 512 bytes, or thereabouts. If you have it sampling at 44KHz in 16bit stereo then these bytes will fill up very quickly; if you have it sampling at 11KHz in 8bit mono then they will fill up sixteen times more slowly.

When it has sampled however many bytes it triggers its OnSample event. It also automatically sends out a signal to all those components which have their 'Source' set to point to the recorder. So, if you had your TWaveView and TFFT pointing to the TRecorder, then they will be automatically alerted.

The FFT has a larger buffer of its own. When it receives its 256 bytes from the TRecorder, or however many, it copies these onto the end of its own larger buffer. It then does a fourier transform on the entirety of its larger internal buffer, and sends out signals to the next stage.

Issues

The slowest process by far is performing the fast fourier transform. If you have TRecorder with a very small buffer (eg. 128 bytes), then the TRecorder buffer will fill up very frequently and so the TFFT component will be asked to do a transform equivalently frequently. This will make things very slow.

The fourier transform is slower if its own buffer is very large.

Therefore, having a small-buffer for TRecorder and a large buffer for TFFT will make everything very sluggish.

If the TFFT component has a small internal buffer, then its analysis will not be very good: it will be unable to detect high frequencies, and its accuracy will not be very good. If you're after a quality analysis then you should set its buffer to 1024 or more. If all you're after is a very quick response to the music then you should set its buffer to 512 or less.

Latency is an issue. Things will never respond exactly on time. Suppose a sound is made. First you have to wait for the TRecorder to fill up. Next, this only gets stuck onto the end of the TFFT component and so its effect is diluted with the previous samples already inside the TFFT buffer. So to get the very fastest response to the music, you'd have to have a small buffer in TRecorder and a small buffer in the TFFT. (Maybe 64 and 128, or 128 and 128). But with values this small the quality of the analysis will be low and you'll only be able to make crude observations about the music.

One day I'm going to go to Masataka Goto's page and learn all about his beat-tracking system, and I'm going to write a screen saver which tracks beats and does something in response.