Sound classification using deep neural networks

Dataset: “50 enviromental sound dataset”

There are 50 types sounds in each fold, each type contains 8 files
- I will use fold 1,2,3,4 for training, total train spectrogram: 1600
- fold 5 for validation, total test spectrogram: 400
In this project I have build a classifier that distinguises between 50 different types of sound.
First I pre-processed the .wav audio files to spectrograms to train a deep learning model.
I have used resnet50 pretrained model. A Deep Residual Learning for Image Recognition.

I have used a pretrained model because learned features are often transferable to different data. For example, a model trained on a large dataset of bird images will contain learned features like edges or horizontal lines that would be transferable to my data.

Notebook sound to spectrogram pre-processing

spectrogram of clapping sound:
Matplotlib specgram

Clapping sound

librosa specgram

Engine sound

Spectrogram classification project notebook Link

Spectrogram Datablock:

Confusion matrix:

Most confused:

As we can see from the above confusion matrix it gets most confused between:
- between clapping and rain is 3 times
- between helicopter and airplane is 3 times
- between wind and siren is 3 times

Dataset: “50 enviromental sound dataset”

Spectrogram classification project notebook Link

Confusion matrix:

Most confused:

Final accuracy is 78%