STIINTE COSPAR capacity
building workshop
Sinaia, June 2007
Multiscale
analysis
Thierry Dudok de Wit
LPCE, CNRS and University of
Orléans
ddwit@cnrs-orleans.fr
The lecture and the computer session on multiscale analysis focus on some examples of multiscale (i.e. wavelet) techniques for space plasma data, here time series. These techniques are particularly appropriate for handling non-stationarity, intermittent features and for denoising.
In the computer session, you’ll be able to address three aspects of multiscale analysis, using Matlab and some functions and data sets that will be provided. The three topics are
These three assignments can be done even if you’re not familiar with Matlab. Text in courier font corresponds to Matlab commands, to be written directly in the Matlab command window. I recommend you first create a directory, from which you will lauch Matlab, and in which you will store all the additional files that are needed by Matlab. Then launch Matlab by typing matlab
|
cd
mkdir Matlab cd Matlab matlab |
To run the different examples below, you will need a few additional scripts or data files that must be downloaded and stored in your local Matlab directory. Get them from the link given on the first page.
Matlab is a metalanguage that is quite similar to IDL and almost identical to Scilab. This language is very modular, since it relies on libraries of functions. There is no need to compile the functions before running them. In Matlab, all variables are by default matrices. A scalar is just a 1x1 matrix. This means that mathematical operations are interpreted as matrix operations
To access the help of Matlab, type
|
help
|
and to understand what a specific function does, type for example
|
help plot
|
To create a row array, do
|
x = [1 2 6 7 8]
|
and to make a column array
|
x = [1 2 6 7 8]’ or x = [1; 2; 6; 7; 8];
|
To select elements 2 to 4 from any of these arrays, do
|
x(2:4)
|
All commands ending with “;” are executed but no output is shown on the screen. With no “;” the results are displayed (this may take a while).
Most functions require input arguments and provide output arguments.
Matlab uses three different files. Most common functions like plot are directly provided as executables. Data files are stored in a proprietary binary ****.mat format. Scripts like the ones I provide are in ascii format and have the syntax ****.m
In this example, we perform an exploratory analysis of magnetic field data recorded by the AMPTE spacecraft in 1984 just upstream the Earth’s quasiparallel bow shock. This data set shows a few bursts of coherent wave packets. One way to find them, is by scanning the data set visually. We shall use instead a multiscale decomposition.
The multiscale analysis we perform here is based on a continuous wavelet transform known as the “à trous” (with holes) algorithm, which is widely used in astronomical image analysis. See the end of this section for references. This wavelet transform is easy to perform, but computationally slow.
|
load ampte
|
The two variables are: magnetic field B in nT (9600x3 matrix) and time in seconds (9600x1 array)
|
y = B(:,2);
y = y-mean(y); clf plot(time,y) |
|
a = 2.^[0:10]
for i=1:11, wy(:,i) = smooth(y,a(i)); end plot(time,wy) |
wy is a 9600x10 matrix in which each column contains a smoothed version of y. The scales are stored in a. For a=1, no smoothing is done (i.e. the time series remains unchanged).
, we recover the original time
series
. This algorithm is widely
appreciated for the easiness by which the original data
can be recovered.
|
my = [wy(:,1:10)-wy(:,2:11) wy(:,11)];
plot(time,my) |
If you want a clearer plot, with each component shifted and the original time series on top, download the function multiplot and do
|
multiplot(time,[y my])
xlabel(’time’) ylabel(’amplitude’) |
We can already identify visually some bursts of activity in the four smallest scales (a=1-8). By zooming in (use the zoom tool on top of the graphics window), you can see them much better. Otherwise, try
|
index = 5501:6500;
multiplot(time(index),[y(index) my(index,:)]) |
These short bursts of coherent waves are whistler-type waves that are generated by nonlinear magnetic structures known as SLAMS.
has a non Gaussian statistics,
i.e. the probability density is non Gaussian. We check
this here by computing the kurtosis or normalised fourth
order moment
is the average of the variable
x(t). For a Gaussian process, the expectation of the
kurtosis is 3. For a variable with large deviations
(heavy tails), m4
>> 3 whereas for a uniform distribution
m4 < 3. We
estimate the kurtosis as follows (note that this estimate
becomes inaccurate for large scales because of a lack of
statistics)
|
k = zeros(11,1);
for i=1:11 z = my(:,i) - mean(my(:,i)); k(i) = mean(z.^4)/(mean(z.^2)^2); end plot(k) |
We readily see that scales a1 to a4 are the ones of interest (and to a lesser degree a5). The characteristic scales of the whistlers thus approximately range from 1 to 16 sampling periods.
The intermittent wave packets you see in this data set are actually dispersive wave trains that are generated by nonlinear structures called SLAMS. This has been discussed in : T. Dudok de Wit et al., Identifying nonlinear wave interactions in space plasmas using two-point measurements: a case study of SLAMS, Journal of Geophysical Research 104 (1999) 17079-17090. Can be downloaded from here.
The AE index is a proxy for the intensity of the Auroral Electrojet, and is computed from the level of geomagnetic fluctuations at auroral latitudes. The power spectral density of the AE index follows a power law (p(f) = A f- B) and this has frequently been studied in relation with a possible evidence for driven self-organised criticality or a signature from solar wind fluctuations.
In all these studies, it is important to get a reliable estimate of the spectral exponent B(or spectral index) over a wide range of scales. If we do this by Fourier transform (i.e. the periodogram method), then we immediately face the problem of choosing a proper time window. This window must be both large (to have access to the lowest frequencies) and small (to lower the variance of the estimates).
The discrete wavelet transform circumvents this problem in a natural way. It provides an estimate of the power density at fixed scales, which are spaced logarithmically: a=1,2,4,8, …. The largest scale is set by the length of the data set.
We consider here 1096 days of AE index data measured at a 1 minute cadence.
|
load AEindex
plot(time, AE) datetick(’x’,’yyyy’) |
In what follows, we’ll also need the sampling interval dt
|
dt = mean(diff(time));
|
|
[p,freq] = spect(AE,dt,8192);
|
The smallest non-zero frequency (f(2)) is 1.2207e-04, which corresponds to 5.7 days. One would obviously like to have longer periods. Try this and see how the variance increases
|
[p,freq] = spect(AE,dt,8192*2);
[p,freq] = spect(AE,dt,8192*4); ... |
Scales formally are not equivalent to inverse frequencies. Nevertheless, one can reasonably well convert one into another by taking
|
[pw,scale] = wavepower(AE,dt);
freqw = 1./scale; |
Now plot the two estimates together on the same plot, with logarithmic axes
|
loglog(freq,p,’r-’,freqw,pw,’b.-’)
legend(’Fourier’,’wavelet’) grid on |
You can detect two different slopes, with a transition around a timescale of about 6 hours. For large frequencies, the self-similarity is believed to associated with the internal dynamics of the auroral electrojet, whereas for frequencies below the transition, we have a signature of the solar wind driving.
If you want to estimate the spectral index without averaging, simply do
|
ratio = pw(1:end-1)./pw(2:end);
index = log(ratio)/log(2); semilogx(freqw(1:end-1)*sqrt(2),index); grid on xlabel(’frequency’) |
Of course, you also need to know the confidence intervals, but this is actually rather easy.
Zoltan Vörös, On multifractality of high-latitude geomagnetic fluctuations, Annales Geophysicae 18 (2000) 1273-1282. Can be downloaded from http://www.copernicus.org/EGU/annales/anngeo.html.
VARIANT is a ukrainian satellite that was launched in 2005 on a low Earth orbit. We consider here electric field measurements that were made in the equatorial region. This data set shows several whistler wave packets. Unfortunately, it is also quite noisy, with narrowband interference noise, broadband instrumental noise, etc. These noises cover the whole spectrum, so Fourier filtering is of little use. However, since the whistler waves are bursty, they can easily be extracted by wavelet filtering.
The filtering strategy is simple:
The user has to take three decisions
Let us now start the denoising
|
load variantE
plot(time, E) |
|
spectrogram(E,time,512,50);
|
or try other window sizes, such as spectrogram(E,time,2048,50);. We can detect at least three whistlers, with their characteristic dispersion. But there also features we would like to eliminate, such as pickup noise between 5 and 7 kHz, narrowband interference noise at 3 kHz and 6 kHz, broadband noise between 3.5 and 5.5 kHz, and some low frequency bursts.
|
[C,L] = wavedec(E,10,’db4’);
plot(C) |
But there is not much we can learn from C because the wavelet coefficients are stored in such a compact way.
|
Ef = wavefilt(E,time,10,’db4’);
|
Look if you see any outstanding wavelet coefficients. Then use the cursor in the graphic window to select (in the lower plot) the threshold below which you want to set all coefficients to zero. Click on the left mouse button to confirm the threshold. Proceed in the same way for each scale (note that the larger the scale is, the less coefficients there are). For the first level, a typical choice for the threshold would be the one suggested in the figure below.
|
figure(1)
spectrogram(E,time,1024,50); figure(2) spectrogram(Ef,time,1024,50); |
Be careful with the interpretation of these two plots, since the spectral density ranges are not the same (spectrogram adapts the range automatically to the data). It would be more meaningful to plot both spectrograms with the same vertical scale. In that case, the result of the denoising becomes even more impressive.

The procedure for selecting the thresholds may look quite subjective, but it can actually easily be automated by using for example Bayesian criteria. To do so, however, you must know what kind of signal you’re looking for, or what kind of noise you want to reject.
|
wavemenu
|
and you will see the following menu.
Select Wavelet 1-D and a new window will emerge. You now need to load the data set before you can proceed. In the new window, select at the top File -> Load -> Signal. After you’ve loaded the data, you can analyse and denoise them. In the top right corner, select the wavelet type. Change from haar to anything else, for example sym. Select at least 7 levels and hit Analyze. You should now get a plot in which the dispersion of the whistlers appears quite evidently.