Outsourcing FROM India

Several weeks ago, I accepted a job from an instructional media developer in Mumbai, India. He said that he had 2.5 hours of recording that needed to have background noise removed. He even provided a brief sample clip of audio. Easy peasy. I bit. It turned out that the audio was actually embedded in 23 separate video files. Whatcha gonna do?

So, I stocked up on Cheetos and Diet Coke, kissed the family goodbye and locked my door.

It would have been one thing to sample the ambient noise, turn the De-noise module loose and go pet the cat while I watched cartoons. Instead, I was going to have to extract the audio, process all 23 files separately, put everything back together, return them and, finally, find a distant mountaintop from which to contemplate the condition of my karma.

I ended up consuming 82.5 GB of disk space on this sucker (and more hours than I care to admit). Note to self: If the requirements are ambiguous, double your quote.

Here's the first 20 seconds.What do you hear?


Okay, here's my comments for comparison (plus some details that you can't hear in this brief clip)

  • There is some kind of rotating machine making noise in the background. While it's basic frequency is usually pretty steady, it sometimes ramps up or down a bit. I think it was a window fan and changing breezes created pressure differentials. I can't use a fixed filter as if it was power line noise. This will be job for some combination of De-noise, Deconstruct and De-hum.

  • The room is not treated for recording. Besides a notable room "tone" there is a complex resonance. There is going to be more than one pass of De-reverb.

  • The instructor often speaks very rapidly at some times and pauses interminably at others. There's nothing to be done about this; I can't touch the timing because it must remain synchronized to the video. I'm actually grateful to not have that monkey on my back. If this had been an audio book, we would put a lot more effort into creating a smooth-sounding delivery.

  • The instructor often fades out at the end of sentences. That I can adjust with the Leveler -- just not too much or it will sound funny and residual noise will also be amplified.

  • The instructor uses a keyboard in use during the recording and, sometimes, the whacks the keys pretty hard. I won't be able to treat this like background noise. If I decide to take them out (and I didn't) the entire recording will have to be hand-edited. Nobody asked for that and I didn't volunteer.

Now, a drum-roll, please.

  • First, I hacked out a lot of the multi-spectral fan noise using six passes of adaptive De-hum. I used very sharp High-Q adaptive filtering to avoid cutting too much else and kept it up until "enough" of the horizontal lines faded out.

  • Next, On impulse, I cut out every signal frequency below 100 Hz. That seemed to help with the room "boominess."

  • Finally, I ran two passes of trained Dialogue De-noise. There's still some anoying fan noise burried in the dialogue, but I'm starting to make the dialogue sound muddy. It's time to quit, lick my wounds (and Cheeto dust) and move on.

  • Truth be told, I also sampled the keyboard clicks as noise and knocked them down a bit too.


In the final analysis, I've been signing on to too many of these badly-recorded gigs. On the bright side, testing the limits is a great way to establish the limits.