Open an interactive online version by clicking the badge Binder badge or download the notebook.

Working with pyfar Audio objects#

Audio data are at the core of pyfar. Three classes can be used for storing, processing, and visualizing such data. These are:

  • pyfar.Signal can be used to store equidistant and complete signals that can be converted between the time and frequency domain.

  • pyfar.TimeData and pyfar.FrequencyData are intended for data that cannot be converted between the time and frequency domain. Examples for this are incomplete time data and non-equispaced or sparse frequency data, e.g., third-octave data.

The following examples introduce fundamental concepts behind audio classes and show how the data inside audio objects can be accessed. See the pyfar audio class documentation for a complete overview.

[1]:
import pyfar as pf

pyfar Signals#

Signals are the most versatile and frequently used audio objects and are thus covered first. In addition, many concepts of Signals cary over to TimeData and FrequencyData.

Creating Signals#

A Signal can be created in the time domain by providing the time data and sampling rate in Hz

[2]:
signal = pf.Signal([1, 0, 0, 0], 4)

creating a Signal in the frequency domain requires to explicitly specify the length of the corresponding time signal in samples and the domain

[3]:
signal2 = pf.Signal([1, 1, 1], 4, n_samples=4, domain='freq')

Accessing data in Signals#

The time domain data stored in audio objects can be accessed trough their time property

[4]:
signal.time
[4]:
array([[1., 0., 0., 0.]])

and the times in seconds at which the data was samples is contained in the times property

[5]:
signal.times
[5]:
array([0.  , 0.25, 0.5 , 0.75])

Similarly the frequency domain data is stored in the freq property

[6]:
signal.freq
[6]:
array([[1.+0.j, 1.+0.j, 1.+0.j]])

and the corresponding frequencies in Hz in the frequencies property

[7]:
signal.frequencies
[7]:
array([0., 1., 2.])

There are a few important things to note right away.

  • data inside audio objects are stored in numpy arrays with at least 2-dimensions. In this example signal is an audio object with a single channel of audio data and the shape of signal.time is (1, 4). In pyfar the samples of time data (4 in this example) and the frequency bins of frequency data are always stored in the last dimension of signal.time and signal.freq

  • The conversion between the time and frequency domain automatically happens upon request by the underlying Signal class. Internally the data is always stored in the domain that was last accessed and converted using the Fast Fourier Transform.

It is also important to note that the data returned by the freq property can depend on the normalization of the Discrete Fourier Transform as explained in more detail here. The frequency data without normalization can be accessed using the freq_raw property. However, for this example the two return the same results

[8]:
signal.freq_raw
[8]:
array([[1.+0.j, 1.+0.j, 1.+0.j]])

Read on to learn how to find out which normalization of the Discrete Fourier Transform is used for a signal.

The indices of a certain frequency or frequency range can be found using

[9]:
index = signal.find_nearest_frequency(1)
print(index)
1

and in analogy signal.find_nearest_time can be used to find a certain point in time.

Write data to Signals#

The same properties can be used to set the data inside audio objects.

[10]:
signal.time = [2, 0, 0, 0]
print(f'signal.time = {signal.time}')

signal.time = [[2. 0. 0. 0.]]
[11]:
signal2.freq = [2, 2, 2]
print(f'signal2.freq = {signal2.freq}')
signal2.freq = [[2.+0.j 2.+0.j 2.+0.j]]

Additional information stored in Signals#

Converting between time and frequency data is already useful, but audio objects can do more. They contain additional information about their data that can be helpful. Some basic information is already returned when prompting an audio object

[12]:
signal
[12]:
time domain energy Signal:
(1,) channels with 4 samples @ 4 Hz sampling rate and none FFT normalization

Lets break down the information given by the Signal one by one.

Domain: The signal is in the time domain, because we last accessed its time data above. We can get and set this property with the signal.domain attribute but it is of not much interest when working with Signal objects because they are automatically converted between the time and frequency domain, when it is required.

[13]:
signal.domain
[13]:
'time'

Signal Type: It is an energy Signal, which is important to note and understand. Energy signals are finite signals with finite energy. Examples for these kinds of signals are impulse responses. Power signals on the other hand have an infinite duration and energy. Examples are noise or sine signals of which we typically observe a block of finite size. The signal type is a read only property that is set by the DFT normalization introduced below.

[14]:
signal.signal_type
[14]:
'energy'

DFT Normalization: For power Signals, different DFT normalization can be used that differently scale the values stored in signal.freq. For energy signals, the DFT normalization is always ‘none’. For more information refer to the Fast Fourier Transform examples.

[15]:
signal.fft_norm
[15]:
'none'

Signal cshape, length, and caxis#

Signals can contain multiple channels of audio data and the shape of the data inside the audio object is often important. There are multiple attributes describing this.

Channel shape: The channel shape, in short cshape gives the shape of the data inside a Signal but ignores the number of samples or frequency bins. For example our first signal created above has one channel and 4 samples, hence the data is of shape (1, 4). Because the cshape ignores the samples it is (1, ). There are two ideas behind this. First, digital signal processing methods often work on channels, and second the length of the signal might be different in the time and frequency domain.

[16]:
signal.cshape
[16]:
(1,)

Length: The signal length in samples and bins is stored in the properties

[17]:
signal.n_samples
[17]:
4

and

[18]:
signal.n_bins
[18]:
3

Note that only the half sided spectrum is stored for real valued time signals. Hence the length differs in the time and frequency domain in this case.

Channel axis: Some functions in pyfar operate along one or more axes of the data. In analogy, to the channel shape, these axes are referred to as channel axis or in short caxis. Consider a signal of cshape=(3, 4) and a duration of n_samples=128. In this case caxis=0 refers to the first dimension of size 3 and caxis=-1 refers to the last dimension of size 4 but not the dimension containing the 128 samples.

Multi-Channel Signals#

Signals may contain data in multiple channels and pyfar offers some functionality to ease handling such signals. Let the following signal of cshape=(2, 3) symbolize data obtained for 2 loudspeaker and 3 microphone positions

[19]:
signal3 = pf.Signal(
    [[[1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0]],
     [[0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1]]],
    6)

signal3
[19]:
time domain energy Signal:
(2, 3) channels with 6 samples @ 6 Hz sampling rate and none FFT normalization

A subset of the channels can be accessed through slicing. For example the first channel in both dimensions can be obtained by

[20]:
channel = signal3[0, 0]
channel
[20]:
time domain energy Signal:
(1,) channels with 6 samples @ 6 Hz sampling rate and none FFT normalization

the output shows that the slice is Signal as well, which is often useful because most pyfar functions require Signals as input. This also works for obtaining larger slices, for example

[21]:
channels = signal3[0]
channels
[21]:
time domain energy Signal:
(3,) channels with 6 samples @ 6 Hz sampling rate and none FFT normalization

returns the data for the first loudspeaker and all three microphone positions.

pyfar TimeData and FrequencyData#

TimeData and FrequencyData objects only store data but never change it. This is different from data stored in Signal objects, which can be converted between the time and frequency domain or scaled according to different FFT normalizations. To create TimeData, specify the data and the times at which the data was sampled

[22]:
# time data with non-equidistant sampling times of 0, 1, and 3 seconds
time = pf.TimeData([1, 0, -1], [0, 1, 3])

and creating FrequencyData requires the specification of the frequencies that belong to the data

[23]:
# frequency data containing values at 400, 800, and 1600 Hz
frequency = pf.FrequencyData([1, .8, .7], [400, 800, 1600])

This showed that creating TimeData and FrequencyData is different from creating Signals. The differences between the audio objects are also reflected in their print:

[24]:
time

[24]:
TimeData:
(1,) channels with 3 samples
[25]:
frequency

[25]:
FrequencyData:
(1,) channels with 3 frequencies

shows that TimeData and FrequencyData do not have a sampling rate and FFT normalization. Apart from this the different audio objects behave very similar. In fact the pyfar.Signal class is derived from pyfar.TimeData and pyfar.FrequencyData. Hence, the time property can be used to access TimeData

[26]:
time.time
[26]:
array([[ 1.,  0., -1.]])

and the freq property can be used to access FrequencyData

[27]:
frequency.freq
[27]:
array([[1. , 0.8, 0.7]])

In general, all time domain functionality of Signals that was introduced above is available for TimeData and all frequency domain functionality is available for FrequencyData.

Further reading#

After getting used to adding and accessing data in audio objects, you are ready for discovering ways of inspecting and working with audio objects. Good next steps might be learning how to graphically display (plot) data from audio objects as detailed here and here. Check out how to apply simple arithmetic operations with audio objects as described here.

License notice#

This notebook © 2024 by the pyfar developers is licensed under CC BY 4.0

CC BY Large

Watermark#

[28]:
%load_ext watermark
%watermark -v -m -iv
Python implementation: CPython
Python version       : 3.10.13
IPython version      : 8.23.0

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 5.19.0-1028-aws
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit

pyfar: 0.6.5