Dataset: “50 enviromental sound dataset”

  • There are 50 types sounds in each fold, each type contains 8 files

    • I will use fold 1,2,3,4 for training, total train spectrogram: 1600

    • fold 5 for validation, total test spectrogram: 400

  • In this project I have build a classifier that distinguises between 50 different types of sound.

  • First I pre-processed the .wav audio files to spectrograms to train a deep learning model.

  • I have used resnet50 pretrained model. A Deep Residual Learning for Image Recognition.

I have used a pretrained model because learned features are often transferable to different data. For example, a model trained on a large dataset of bird images will contain learned features like edges or horizontal lines that would be transferable to my data.

Notebook sound to spectrogram pre-processing

  • spectrogram of clapping sound:

  • Matplotlib specgram

  • Clapping sound
  • librosa specgram
  • Engine sound
  • Spectrogram Datablock:

kd

Confusion matrix:

kd

Most confused:

kd

  • As we can see from the above confusion matrix it gets most confused between:

    • between clapping and rain is 3 times
    • between helicopter and airplane is 3 times
    • between wind and siren is 3 times

Final accuracy is 78%