Dataset: “50 enviromental sound dataset”
There are 50 types sounds in each fold, each type contains 8 files
I will use fold 1,2,3,4 for training, total train spectrogram: 1600
fold 5 for validation, total test spectrogram: 400
In this project I have build a classifier that distinguises between 50 different types of sound.
First I pre-processed the .wav audio files to spectrograms to train a deep learning model.
I have used resnet50 pretrained model. A Deep Residual Learning for Image Recognition.
I have used a pretrained model because learned features are often transferable to different data. For example, a model trained on a large dataset of bird images will contain learned features like edges or horizontal lines that would be transferable to my data.
Notebook sound to spectrogram pre-processing
spectrogram of clapping sound:
Matplotlib specgram
- Clapping sound
- librosa specgram
- Engine sound
Spectrogram classification project notebook Link
- Spectrogram Datablock:
Confusion matrix:
Most confused:
As we can see from the above confusion matrix it gets most confused between:
- between clapping and rain is 3 times
- between helicopter and airplane is 3 times
- between wind and siren is 3 times