Building a simple online Karaoke site

by Pascal Rettig posted Dec 30, 2010

TL;DR - We built a online holiday themed sing-a-long site called to play around with recording web audio without a streaming server and synchronizing HTML5 Audio tags with playback. Check out FlashWavRecorder and Read the notes at the bottom for our takeaways.

Step 1: Recording Audio on the Web

Recording audio on the Web had always been a gray area for me - Flash has had the capability for a long time, but I'd never dug into it and had assumed that most of the recording capabilities needed to be paired with a media server on the back end. This is not the case.

For the online language learning app we've been working on we wanted to find a way to make it as easy as possible for users to record their voices (with an eye towards sending that information to the server for analysis down the road) We also wanted to create a simple way to record native voices directly into the page without having to set up the infrastructure for a streaming server.

We looked around to see if anyone had packaged something together but we didn't find anything that quite fit the bill so we decided to do it ourselves.

A little bit of Googling showed that getting the audio data directly out from a microphone is pretty easy, so our other developer Doug took on the task and within a day we had a basic recorder with playback. A little bit of work massaging the data with a WAV header, and adding in a way to do a multi-part upload and we had a nice way to record audio over the web without a lot of fuss.

Since our language learning App is in HTML, we wanted to keep the Flash as small as possible, both in file size and in on the page. This approach required that we deal with a couple of issues:

  1. Users need to explicitly authorize use of the microphone via a standard Flash player dialog, this means we need to (at least temporarily) make the applet large enough to display the authorization window.
  2. To actually save the audio data and post it, we needed to have the user click on something in the applet (See: Upload and download require user interaction )

To figure a way around those issues, the solution we came up with was:

  1. When the authorization dialog appears, make the applet larger and turn it to windowed from transparent mode (required to show the authorization dialog)
  2. When audio has been recorded, show the upload image or button that POST's to server in the applet

In practice we end up moving the applet around to get it in a spot that made sense on the page depending on what state it was in.

Step 2: Enter FlashWavRecorder

The package we developed is called FlashWavRecorder which consists of a tiny (6k) SWF to handle the recording along with a Javascript interface that handles most of the interaction.

The code is up on GitHub for perusal and can be built with the Flex SDK (you can also just use the pre-compiled SWF) Here's the example in the html/ directory in action:

Note: the upload doesn't work because it's getting served off of GitHub pages. 

For more details on the recording and playback, take a look at as it handles the sound recording and playback.

To use the recorder, there are three main things you'll need to understand:

  1. How to embed the recorder .swf into the page
  2. What event callbacks the recorder will send you
  3.  What the Javascript interface to interacting with the recorder looks like.

The GitHub readme describes these three pieces for you, but the example in html/index.html is probably the best play to start to see a working example.

To set up the recorder, you'll need to embed the SWF on the page and set variables for the event handler and upload_image - there's a sample event handler in html/js/recorder.js which also create a global "Recorder" object that can be used to interact with the recorder on the page.

The event handler receives the 'ready' callback once the flash is ready, and the 'microphone_user_request' callback if we need to display the permission screen. For the former, we set up the Recorder object with the Javascript interface by calling connect, for the latter we resize the applet to display the permission dialog.

Once it's connected, using the recorder from javascript is as simple as calling:

Recorder.record(name,filename); // To start recording
Recorder.record(name,filename); // To stop recording; // To start playback
...; // To stop playback

The recorder supports multiple named recordings (hence the name field). This means you just need two buttons - record and play - that call Recorder.record and respectively and you can have the event handler update the images on those buttons as necessary. Again, take a look at the sample code in html/ for the simplest possible working example.

Step #2: Saving and converting

Once you've got the audio in the recorder, we need to be able to upload the data to a server for further processing. As mentioned above in order to be able to send the data, Flash insists that a user actually click in the flash applet, since we may have additional information that needs to be included in the upload, there's a callback called 'save_pressed' that gets triggered so we can update the additional form data.

The example code [Line 71 of recorder.js] calls 'Recorder.updateForm();' which uses JQuery to serialize the form and update the flash recorder object with the data in the form (see line 161 of recorder.js)

When we called init on the recorder originally we passed it the action of the form, so it knows where to post the data, and that's what it does, adding a standard file upload field containing a .WAV file to any additional fields you passed in and sending it via POST to the server.

upload.php contains the minimum necessary needed to write the .WAV file to disk - but the upload is just a standard multi-part upload file field so no special handling is necessary.

Once the file is uploaded, you'll most likely want to convert it to something less weighty. Since we decided to use HTML5 Audio tags we actually need to encode our files into two formats, ogg and mp3, in order for them to work across the range of browsers (.mp3 for Safari & IE, .ogg for Opera & Firefox, Chrome is happy with either).

We can use oggenc to encode to ogg as simply as:

oggenc filename.wav -i filename.ogg

and LAME (once, of course, you've purchased your MP3 encoding license from Fraunhofer) with:

lame filename.wav filename.mp3

Since the encoding may take a little bit of time, we ended up doing it in a background process and polling the server for status messages.

Step #3: Karaoke Time!

Once we had a Recorder that could upload files, it became both obvious and necessary that we put together some sort of online sing-a-long hack.

What Karaoke setup would be complete without a bouncing ball? Well, luckily enough someone wrote a JQuery plugin to do just that. Sure it's marked as alpha code and a little buggy, but it gets the job done enough for a quick hack.

All we needed to do was put in the correct annotation tags for the lyrics and tie it all together with a little bit o' Javascript. For those that haven't used HTML5 Audio tags before, they are dead simple to create.

Here's our code for loading the sound for each caroller. We use JQuery to bind the event that tells us the audio is ready to play so we know we can show our carollers:

caroller.sound = new Audio();
$(caroller.sound).bind('canplaythrough',function() {
       if(!caroller.loaded && --loadingcnt == 0) {
      caroller.loaded = true; 
caroller.sound.src = ? ogg : mp3;

This will create the new audio tag, hide our carollers, add a callback to show our carollers if we're done loading, set the source file based on which audio format our browser supports using Modernizr, and then calling load() to start loading the file.

We then just play our music track and all our carollers at the same time (muting any that aren't checked) and bingo - instant online Karaoke!

Check out our sample site at (Best in Chrome)

Some Notes

First, synchronizing a whole bunch of HTML5 Audio streams with Flash audio playback is a little hit-or-miss. This sort of thing will get a lot easier to control precisely if and when the Audio Data API becomes standardized, for the time being doing the audio mixing manually in Flash would probably be the way to go if my life (or startup) depended on it.

Secondly, since we're storing and transmitting raw WAV data in the recorder, you need to be wary of the length you allow users to record. The amount of storage is equal to:

Samples per second * bytes per sample * seconds recorded

Flash uses 8-byte floats to store the data, so at 22khz a 30 second clip would be:

22000 * 8 * 30 = Approx 5MB

When the file is converted to a WAV we use 16bits to represent the data, so that comes down to 1.25 MB for uploading.

5MB in memory and 1.25 MB to transmit isn't the end of the world, but if you try to use this for recording longer amounts of audio you're going to run into some problems. You'll want to use one of the existing solutions for streaming data that gets the bitrate way down.

Bonus: Fun with Sox

Audio recorded from user microphones is of drastically varying quality, so one tool that comes in handy is SoX - the (aptly) self-proclaimed Swiss Army knife of sound processing programs. SoX let's you remove noise based on a noise profile and trim audio to remove silences along with a host of other operations and since it works from the command line it's relatively easy to integrate into your backend architecture.

Here's a simple command for removing noise via the command line. You can adjust the 0.3 reduction amount and you'll need a noise profile file - you can use the beginning of the sound file or record one separately:

sox filename new_filename noisered noise_profile_file 0.3 norm vad reverse vad reverse