AI Notebooks: analyze and classify sounds with AI
A guide to analyzing and classifying marine mammal sounds.
Since you’re reading a blog post from a tech company, I bet you’ve heard of AI, machine, and deep learning many times before.
Audio or sound classification is a technique with multiple applications in the field of AI and data science.
- Classification of acoustic data:
– identifies the location
– differentiates environments
– plays a role in monitoring ecosystems
- Environmental sound classification:
– recognition of urban sounds
– used in security system
– used in predictive maintenance
– used to differentiate animal sounds
– categorize music
– key role in: organization of audio libraries by genre, improvement of recommendation algorithms, discovery of trends, listener preferences through data analysis, …
- Classification in natural language:
– classification of human speech
– common in: chatbots, virtual assistants, tech-to-speech application, …
In this article, we will examine the classification of marine mammal sounds.
The purpose of this article is to explain how to train a model to classify audios using AI Notebooks.
In this tutorial, the sounds of the dataset are in .wav format. To be able to use them and obtain results, it is necessary to pre-process these data by following different steps.
Analyze one of these audio recordings
Turn every sound file into a .csv file
Train your model from .csv file
USE CASE:Best of Watkins Marine Mammal Sounds Database
This dataset is composed of 55 different folders corresponding to marine mammals. In each folder are stored several sound files of each animal.
You can get more information about this dataset on this website.
The data distribution is as follows:
⚠️ For this example, we choose only the 45 first lessons (or folders).
Let’s follow the different steps!
1. Load an audio file with Librosa
Librosa is a Python module for audio signal analysis. Using Librosa, you can extract key features from audio samples such as Tempo, Chroma Energy Normalized, Mel-Freqency Cepstral Coefficients, Spectral Centroid, Spectral Contrast, Spectral Rolloff and Zero Crossing Rate. If you want to know more about this library, refer to the documentation.
You can start by looking at your data by viewing different parameters using the Librosa library.
First, you can do a test on a file.
Loads and decodes audio.
2. Play audio with IPython.display.Audio
IPython.display.Audio advises you to play audio directly in a Jupyter notebook.
Using IPython.display.Audio to play sound.
Waveforms are visual representations of sound as time on the x-axis and amplitude on the y-axis. They allow rapid analysis of audio data.
We can plot the audio array using librosa.display.waveplot.
A spectrogram is a visual way to represent the intensity of a signal over time at different frequencies present in a particular waveform.
3. Spectral reduction
Spectral attenuation is the frequency below which a specified percentage of the total spectral energy.
librosa.feature.spectral_rolloff calculates the attenuation frequency for each frame of a signal.
4. Chroma function
This tool is perfect for analyzing musical characteristics whose pitches can be meaningfully categorized and whose tuning is close to the equal temperament scale.
5. Zero Crossing Rate
A zero crossing occurs if successive samples have different algebraic signs.
The rate at which zero crossings occur is a simple measure of the frequency content of a signal.
The number of zero crossings measures the number of times in a time interval that the amplitude of the speech signals passes through a zero value.
1. Data transformation
To train your model, some data preprocessing is required. First, you need to convert the .wav in .csv file.
Create the data.csv file:
Set marine mammal character string (45):
There are 45 different sea animals, or 45 classes.
Transform every .wav deposit in a .csv line:
Show the data.csv file:
During data preprocessing, feature extraction is required before starting the training. The goal is to define the contributions and outputs of the neural network.
- EXIT (y): last column which is the label.
You cannot use the text directly for training. You will encode these tags with the LabelEncoder() function of sklearn.preprocessing.
Before running a model, you must convert this type of categorical textual data into numerical data that the model can understand.
- CONTRIBUTIONS (X): all the other columns are input parameters of the neural network.
Delete the first column which does not provide any information for the training (the file name) and the last which corresponds to the output.
3. Divide the dataset for training
Build the model
The first step is to create the model and display the summary.
For the CNN model, all hidden layers use a ReLU activation function, the output layer has Softmax function and a To give up is used to avoid overfitting.
Model training and evaluation
Adam optimizer is used to train the model on 100 eras. This choice was made because it allows us to obtain better results.
The loss is calculated with the sparse_categorical_crossentropy a function.
Now start training!
Save the model for future inference
1. Save and store the model in an OVHcloud Object Container
You can check your template directory.
Registered_model contains an assets folder, save_model.pb and a variables folder.
Then you can load this template.
Do you want to use this model in a Streamlit application? Check out our GitHub repository.
Model accuracy can be improved by increasing the number of epochs, but after a certain period we reach a threshold, so the value should be determined accordingly.
The precision obtained for the test set is 93.71%which is a satisfactory result.
Do you want to know more?
If you want to access the notebook, refer to the GitHub repository.
To launch and test this notebook with AI Notebooksplease refer to our documentation.
Hope you enjoyed this article. Try for yourself!