STIINTE COSPAR capacity building workshop
Sinaia, June 2007




Multiscale analysis
Thierry Dudok de Wit
LPCE, CNRS and University of Orléans
ddwit@cnrs-orleans.fr



The lecture and the computer session on multiscale analysis focus on some examples of multiscale (i.e. wavelet) techniques for space plasma data, here time series. These techniques are particularly appropriate for handling non-stationarity, intermittent features and for denoising.

In the computer session, you’ll be able to address three aspects of multiscale analysis, using Matlab and some functions and data sets that will be provided. The three topics are

  1. Exploratory multiscale analysis of AMPTE magnetic field data with intermittent bursts of wave activity
  2. Estimation of the spectral exponent of the AE index
  3. Denoising of electric field data from VARIANT

Documents for download

 

Further reading on wavelets

 



To start the computer session

These three assignments can be done even if you’re not familiar with Matlab. Text in courier font corresponds to Matlab commands, to be written directly in the Matlab command window. I recommend you first create a directory, from which you will lauch Matlab, and in which you will store all the additional files that are needed by Matlab. Then launch Matlab by typing matlab

        cd  
        mkdir Matlab  
        cd Matlab  
        matlab

To run the different examples below, you will need a few additional scripts or data files that must be downloaded and stored in your local Matlab directory. Get them from the link given on the first page.

 

For those who don’t know Matlab

Matlab is a metalanguage that is quite similar to IDL and almost identical to Scilab. This language is very modular, since it relies on libraries of functions. There is no need to compile the functions before running them. In Matlab, all variables are by default matrices. A scalar is just a 1x1 matrix. This means that mathematical operations are interpreted as matrix operations

 

To access the help of Matlab, type

        help

and to understand what a specific function does, type for example

        help plot

To create a row array, do

        x = [1 2 6 7 8]

and to make a column array

        x = [1 2 6 7 8]’   or    x = [1; 2; 6; 7; 8];

To select elements 2 to 4 from any of these arrays, do

        x(2:4)

All commands ending with “;” are executed but no output is shown on the screen. With no “;” the results are displayed (this may take a while).

Most functions require input arguments and provide output arguments.

Matlab uses three different files. Most common functions like plot are directly provided as executables. Data files are stored in a proprietary binary ****.mat format. Scripts like the ones I provide are in ascii format and have the syntax ****.m

 



1 Exploratory analysis of magnetic field data

In this example, we perform an exploratory analysis of magnetic field data recorded by the AMPTE spacecraft in 1984 just upstream the Earth’s quasiparallel bow shock. This data set shows a few bursts of coherent wave packets. One way to find them, is by scanning the data set visually. We shall use instead a multiscale decomposition.

The multiscale analysis we perform here is based on a continuous wavelet transform known as the “à trous” (with holes) algorithm, which is widely used in astronomical image analysis. See the end of this section for references. This wavelet transform is easy to perform, but computationally slow.

 

  1. First download the following files : ampte.mat, smooth.m, multiplot.m (see link given at the beginning) and store them in your Matlab directory.

  2. In your Matlab session, load the data
            load ampte

    The two variables are: magnetic field B in nT (9600x3 matrix) and time in seconds (9600x1 array)

  3. select the By component only, subtract its mean and plot it
            y = B(:,2);  
            y = y-mean(y);  
            clf  
            plot(time,y)

     

  4. We now perform a Gaussian smoothing, using a series of Gaussian filters whose width (i.e. scale) increases logarithmically. The largest scale should not exceed 9600, which is the length of the time series (try length(y) )
    where c is a normalisation factor. To do this smoothing, you need to download the smooth function first. Then create the array of scales a and smooth the data
            a = 2.^[0:10]  
            for i=1:11, wy(:,i) = smooth(y,a(i)); end  
            plot(time,wy)

    wy is a 9600x10 matrix in which each column contains a smoothed version of y. The scales are stored in a. For a=1, no smoothing is done (i.e. the time series remains unchanged).

  5. The multiscale “à trous” decomposition of y(t) is simply defined as the differences
    By adding all columns of , we recover the original time series $\mathcal {W}y_{a_1} = \mathcal {W}y_{a=1} = y$. This algorithm is widely appreciated for the easiness by which the original data can be recovered.
            my = [wy(:,1:10)-wy(:,2:11) wy(:,11)];  
            plot(time,my)

    If you want a clearer plot, with each component shifted and the original time series on top, download the function multiplot and do

            multiplot(time,[y my])  
            xlabel(’time’)  
            ylabel(’amplitude’)

    We can already identify visually some bursts of activity in the four smallest scales (a=1-8). By zooming in (use the zoom tool on top of the graphics window), you can see them much better. Otherwise, try

            index = 5501:6500;  
            multiplot(time(index),[y(index) my(index,:)])

    These short bursts of coherent waves are whistler-type waves that are generated by nonlinear magnetic structures known as SLAMS.

  6. How do we know which scales are interesting ? There is no unique answer to that question. Quite often, however, the scales of interest are those for which the $\mathcal {M}y_a$ has a non Gaussian statistics, i.e. the probability density is non Gaussian. We check this here by computing the kurtosis or normalised fourth order moment
    where $\bar {x} = \langle x \rangle $ is the average of the variable x(t). For a Gaussian process, the expectation of the kurtosis is 3. For a variable with large deviations (heavy tails), m4 >> 3 whereas for a uniform distribution m4 < 3. We estimate the kurtosis as follows (note that this estimate becomes inaccurate for large scales because of a lack of statistics)
            k = zeros(11,1);  
            for i=1:11  
            z = my(:,i) - mean(my(:,i));  
            k(i) = mean(z.^4)/(mean(z.^2)^2);  
            end  
            plot(k)

    We readily see that scales a1 to a4 are the ones of interest (and to a lesser degree a5). The characteristic scales of the whistlers thus approximately range from 1 to 16 sampling periods.

 

Further reading

The intermittent wave packets you see in this data set are actually dispersive wave trains that are generated by nonlinear structures called SLAMS. This has been discussed in : T. Dudok de Wit et al., Identifying nonlinear wave interactions in space plasmas using two-point measurements: a case study of SLAMS, Journal of Geophysical Research 104 (1999) 17079-17090. Can be downloaded from here.

 



2 What is the spectral exponent of the AE index ?

The AE index is a proxy for the intensity of the Auroral Electrojet, and is computed from the level of geomagnetic fluctuations at auroral latitudes. The power spectral density of the AE index follows a power law (p(f) = A f- B) and this has frequently been studied in relation with a possible evidence for driven self-organised criticality or a signature from solar wind fluctuations.

In all these studies, it is important to get a reliable estimate of the spectral exponent B(or spectral index) over a wide range of scales. If we do this by Fourier transform (i.e. the periodogram method), then we immediately face the problem of choosing a proper time window. This window must be both large (to have access to the lowest frequencies) and small (to lower the variance of the estimates).

The discrete wavelet transform circumvents this problem in a natural way. It provides an estimate of the power density at fixed scales, which are spaced logarithmically: a=1,2,4,8, …. The largest scale is set by the length of the data set.

We consider here 1096 days of AE index data measured at a 1 minute cadence.

 

  1. First download the following files: AEindex.mat, wavepower.m, spect.m (see link given at the beginning) and store them in your Matlab directory.
  2. In your Matlab session, load and plot the data. The variable AE contains the AE index (in nT) and time is expressed in days since the year 0.
            load AEindex  
            plot(time, AE)  
            datetick(’x’,’yyyy’)

    In what follows, we’ll also need the sampling interval dt

            dt = mean(diff(time));

     

  3. Compute the power spectral density, using the classical periodogram method. First try it with windows of size 8192 (must be a power of 2)
            [p,freq] = spect(AE,dt,8192);

    The smallest non-zero frequency (f(2)) is 1.2207e-04, which corresponds to 5.7 days. One would obviously like to have longer periods. Try this and see how the variance increases

            [p,freq] = spect(AE,dt,8192*2);  
            [p,freq] = spect(AE,dt,8192*4);  
            ...

     

  4. The power density is easy to estimate from the discrete wavelet transform. If the wavelet coefficients at the i’th scale ai are [ci1, ci2, ci3, ... cin], then the power density at that scale is

    Scales formally are not equivalent to inverse frequencies. Nevertheless, one can reasonably well convert one into another by taking

    The function that computes the power density and does all the rest of the work is wavepower
            [pw,scale] = wavepower(AE,dt);  
            freqw = 1./scale;

    Now plot the two estimates together on the same plot, with logarithmic axes

            loglog(freq,p,’r-’,freqw,pw,’b.-’)  
            legend(’Fourier’,’wavelet’)  
            grid on

     

    Figure 1 Power spectral density as estimated by the periodogram method and with the discrete wavelet transform


  5. The agreement between the two curves is remarkably good, but the discrete wavelet transform provides better estimates (lower variance) and also allows lower frequencies to be investigated (although the analysis of these very low frequencies requires more care by lack of statistics).

    You can detect two different slopes, with a transition around a timescale of about 6 hours. For large frequencies, the self-similarity is believed to associated with the internal dynamics of the auroral electrojet, whereas for frequencies below the transition, we have a signature of the solar wind driving.

    If you want to estimate the spectral index without averaging, simply do

            ratio = pw(1:end-1)./pw(2:end);  
            index = log(ratio)/log(2);  
            semilogx(freqw(1:end-1)*sqrt(2),index);  
            grid on  
            xlabel(’frequency’)

    Of course, you also need to know the confidence intervals, but this is actually rather easy.

 

Further reading

Zoltan Vörös, On multifractality of high-latitude geomagnetic fluctuations, Annales Geophysicae 18 (2000) 1273-1282. Can be downloaded from http://www.copernicus.org/EGU/annales/anngeo.html.

 



3 Denoising with wavelets

VARIANT is a ukrainian satellite that was launched in 2005 on a low Earth orbit. We consider here electric field measurements that were made in the equatorial region. This data set shows several whistler wave packets. Unfortunately, it is also quite noisy, with narrowband interference noise, broadband instrumental noise, etc. These noises cover the whole spectrum, so Fourier filtering is of little use. However, since the whistler waves are bursty, they can easily be extracted by wavelet filtering.

The filtering strategy is simple:

The user has to take three decisions

What wavelet should I choose ?
This is a rather technical issue that does not affect the outcome so much. As a rule of thumb, the wavelet should resemble the type of feature you want to extract. Daubechies wavelets are widely used, but there are many other families (coiflets, biorthogonal, meyer, …).
What should the order N of the wavelet be ?
An N’th order wavelet is invariant to any trend that can be reproduced with a N’th order polynomial. Low order (N ¡ 3) wavelets are good for fitting sharp discontinuities, whereas high order (N¿4) wavelets are better suited for smooth features. Unless the data set shows spikes and discontinuities, it is preferable to avoid low orders.
Into how many levels M should I decompose de data ?
The data are decomposed into scales ranging from 1, 2, 4, 8, …, 2M. Choose a small M if you want to process small scales only. Be sure that M << length of the record.

Let us now start the denoising

  1. First download the following files: variantE.mat, spectrogram.m, wavefilt.m, spect.m (see link given at the beginning) and store them in your Matlab directory.
  2. In your Matlab session, load and plot the data. The variable E contains the electric field (in V) and time is given in seconds.
            load variantE  
            plot(time, E)

     

  3. Let's have a look at the spectrogram, to see if there are specific waves. Use the function spectrogram for that, which computes the Fourier transform in windows of 512 samples, with 50% overlap between adjacent windows.
            spectrogram(E,time,512,50);

    or try other window sizes, such as spectrogram(E,time,2048,50);. We can detect at least three whistlers, with their characteristic dispersion. But there also features we would like to eliminate, such as pickup noise between 5 and 7 kHz, narrowband interference noise at 3 kHz and 6 kHz, broadband noise between 3.5 and 5.5 kHz, and some low frequency bursts.

  4. Do the wavelet decomposition. The wavelet coefficients are stored in the variables C and L. We use here 4’th order Daubechies wavelets and M=10 levels.
            [C,L] = wavedec(E,10,’db4’);  
            plot(C)

    But there is not much we can learn from C because the wavelet coefficients are stored in such a compact way.

  5. The function wavefilt does this in a more user-friendly way; it plots for each scale the wavelet coefficients together with the electric field.
            Ef = wavefilt(E,time,10,’db4’);

    Look if you see any outstanding wavelet coefficients. Then use the cursor in the graphic window to select (in the lower plot) the threshold below which you want to set all coefficients to zero. Click on the left mouse button to confirm the threshold. Proceed in the same way for each scale (note that the larger the scale is, the less coefficients there are). For the first level, a typical choice for the threshold would be the one suggested in the figure below.

     

    Figure 2 Typical choise for a threshold that provides a good discrimination between wavelet coefficients.


  6. Repeat this procedure with other types of wavelets: db1 or Haar wavelet, which is very discontinuous, or db12 wavelet, which is on the contrary very regular. Try also different levels: M=2, 3, 4, …. Do you manage to extract the weak whistler at t=35.5 [sec] ?
  7. To check the effect of the filtering, compare the original spectrogram with the denoised one
            figure(1)  
            spectrogram(E,time,1024,50);  
            figure(2)  
            spectrogram(Ef,time,1024,50);

    Be careful with the interpretation of these two plots, since the spectral density ranges are not the same (spectrogram adapts the range automatically to the data). It would be more meaningful to plot both spectrograms with the same vertical scale. In that case, the result of the denoising becomes even more impressive.

     


    Figure 3: Spectrogram of the original electric field data (above) and the denoised one (below), using the same amplitude scale.

    The procedure for selecting the thresholds may look quite subjective, but it can actually easily be automated by using for example Bayesian criteria. To do so, however, you must know what kind of signal you’re looking for, or what kind of noise you want to reject.

  8. If you want to play more with wavelets, launch the wavelet toolbox, which has plenty of GUIs
            wavemenu

    and you will see the following menu.

    Select Wavelet 1-D and a new window will emerge. You now need to load the data set before you can proceed. In the new window, select at the top File -> Load -> Signal. After you’ve loaded the data, you can analyse and denoise them. In the top right corner, select the wavelet type. Change from haar to anything else, for example sym. Select at least 7 levels and hit Analyze. You should now get a plot in which the dispersion of the whistlers appears quite evidently.




Last update : June 6, 2007