Preprocessing - Reading continuous EEG data

A convenient use of the ft_preprocessing is to read the continuous data fully in memory. This is feasible if your data set is relatively small and if your computer has enough memory to hold all data in memory at once.

Using this approach, you can read all data from the file into memory, apply filters, and subsequently cut the data into interesting segments.

The following steps are taken to read data, to apply filters and to reference the data (in case of EEG), and optionally to select interesting segments of data around events or triggers or by cutting the continuous data into convenient constant-length segments.

  • read the data for the EEG channels using ft_preprocessing, apply a filter and re-reference to linked mastoids
  • read the data for the horizontal and vertical EOG channels using ft_preprocessing, and compute the horizontal and vertical bipolar EOG derivations
  • combine the EEG and EOG into a single data representation using ft_appenddata
  • determine interesting pieces of data based on the trigger events using ft_definetrial
  • segment the continuous data into trials using ft_redefinetrial
  • segment the continuous data into one-second pieces using ft_redefinetrial

In this tutorial we will be using an EEG data set that was acquired by Irina Siminova in a study investigating semantic processing of stimuli presented as pictures, visually displayed text or as auditorily presented words. Data was acquired with a 64-channel BrainProducts Brainamp EEG amplifier from 60 scalp electrodes placed in an electrode cap, one electrode placed under the right eye; signals “EOGv” and “EOGh” are computed after acquisition using re-referencing. During acquisition all channels were referenced to the left mastoid and an electrode placed at the earlobe was used as the ground. Channels 1-60 correspond to electrodes that are located on the head, except for channel 53 which is located at the right mastoid. Channels 61, 62, 63 are not connected to an electrode at all. Channel 64 is connected to an electrode placed below the left eye. Hence we have 62 channels of interest: 60 from the head + eogh + eogv.

Furthermore, the standard CTF MEG tutorial dataset (Subject01.ds) will be used for one of the examples.

The EEG data used below is available from the ftp server at ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/SubjectEEG.zip and the MEG data from ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/Subject01.zip

The simplest method for preprocessing and reading the data into memory is by calling the ft_preprocessing function with only the dataset as configuration argument.

cfg = [];
cfg.dataset     = 'subj2.vhdr';
data_org        = ft_preprocessing(cfg)
>> data_org                             
data_org = 
      hdr: [1x1 struct]
    label: {64x1 cell}
    trial: {[64x1974550 double]}
     time: {[1x1974550 double]}
  fsample: 500
      cfg: [1x1 struct]

This reads the data from file as one long continuous segment without any additional filtering. The resulting data is represented as one very long trial. To plot the potential in one of the channels, you can simply use the MATLAB plot function.

chansel  = 1; 
plot(data_org.time{1}, data_org.trial{1}(chansel, :))
xlabel('time (s)')
ylabel('channel amplitude (uV)')
legend(data_org.label(chansel))

If the data on disk is stored in a segmented or epoched format, i.e. where the file format already reflects the trials in the experiment, a call to ft_preprocessing will return in the data being read and segmented into the original trials.

cfg = [];
cfg.dataset     = 'Subject01.ds';
data_meg        = ft_preprocessing(cfg);
>> data_meg
data_meg = 
      hdr: [1x1 struct]
    label: {187x1 cell}
    trial: {1x266 cell}
     time: {1x266 cell}
  fsample: 300
     grad: [1x1 struct]
      cfg: [1x1 struct]

This segmented MEG data dataset contains 266 trials. The following example shows how you can plot the data in a subset of the trials.

for trialsel=1:10
  chansel = 1; % this is the STIM channel that contains the trigger
  figure
  plot(data_meg.time{trialsel}, data_meg.trial{trialsel}(chansel, :))
  xlabel('time (s)')
  ylabel('channel amplitude (a.u.)')
  title(sprintf('trial %d', trialsel));
end

If you want to force epoched data to be interpreted as continuous data, you can use the cfg.continuous option, like this:

cfg = [];
cfg.dataset     = 'Subject01.ds';
cfg.continuous  = 'yes';
data_meg        = ft_preprocessing(cfg);

chansel = 1;
plot(data_meg.time{1}, data_meg.trial{1}(chansel, :))
xlabel('time (s)')
ylabel('channel amplitude (a.u.)')

For preprocessing this EEG data set, the choice of the reference has to be considered. During acquisition the reference channel of the EEG amplifier was attached to the left mastoid. We would like to analyze this data with a linked-mastoid reference (also known as an average-mastoid reference). Furthermore, the detection of eye movement and blink artifacts is facilitated by computing bipolar derivation for the electrodes that were placed horizontally and vertically around the eyes.

The channel names that were configured in the Brainamp Recorder software correspond to the labels of the locations in the electrode cap. These electrode locations are numbered 1 to 60, and the corresponding channel names as ascii strings are '1', '2', … '60'. Channel 53 correspond to the right mastoid. Since the left mastoid was used as reference in the acquisition, it is not represented in the data file (because the Voltage at that electrode is zero by definition).

cfg = [];
cfg.dataset     = 'subj2.vhdr';
cfg.reref       = 'yes';
cfg.channel     = 'all';
cfg.implicitref = 'M1';            % the implicit (non-recorded) reference channel is added to the data representation
cfg.refchannel     = {'M1', '53'}; % the average of these channels is used as the new reference, note that channel '53' corresponds to the right mastoid (M2)
data_eeg        = ft_preprocessing(cfg);

For consistency we will rename the channel with the name '53' located at the right mastoid to 'M2'

chanindx = find(strcmp(data_eeg.label, '53'));
data_eeg.label{chanindx} = 'M2';

To discard channels that we do not need any more we can do:

cfg = [];
cfg.channel     = [1:61 65];                      % keep channels 1 to 61 and the newly inserted M1 channel
data_eeg        = ft_preprocessing(cfg, data_eeg);

If you look at the data, you will see that it contains a single trial. That single trial represents the complete continuous recording and therefore it is approximately one hour long.

plot(data_eeg.time{1}, data_eeg.trial{1}(1:3,:));
legend(data_eeg.label(1:3));

Subsequently we read the data for the horizontal EOG

cfg = [];
cfg.dataset    = 'subj2.vhdr';
cfg.channel    = {'51', '60'};
cfg.reref      = 'yes';
cfg.refchannel = '51';
data_eogh      = ft_preprocessing(cfg);

The resulting channel 51 in this representation of the data is referenced to itself, which means that it contains zero values. This can be checked by

figure
plot(data_eogh.time{1}, data_eogh.trial{1}(1,:));
hold on
plot(data_eogh.time{1}, data_eogh.trial{1}(2,:),'g'); 
legend({'51' '60'}); 

For convenience we rename channel 60 into EOGH and use the ft_preprocessing function once more to select the horizontal EOG channel and discard the dummy channel.

data_eogh.label{2} = 'EOGH';

cfg = [];
cfg.channel = 'EOGH';
data_eogh   = ft_preprocessing(cfg, data_eogh); % nothing will be done, only the selection of the interesting channel

The processing of the vertical EOG is done similar, using the difference between channel 50 and 64 as the bipolar EOG

cfg = [];
cfg.dataset    = 'subj2.vhdr';
cfg.channel    = {'50', '64'};
cfg.reref      = 'yes';
cfg.refchannel = '50'
data_eogv      = ft_preprocessing(cfg);

data_eogv.label{2} = 'EOGV';

cfg = [];
cfg.channel = 'EOGV';
data_eogv   = ft_preprocessing(cfg, data_eogv); % nothing will be done, only the selection of the interesting channel

Now that we have the EEG data rereferenced to linked mastoids and the horizontal and vertical bipolar EOG, we can combine the three raw data structures into a single representation of using

cfg = [];
data_all = ft_appenddata(cfg, data_eeg, data_eogh, data_eogv);

In the example above, no filters were applied to the data. It is possible to apply filters to the data during the initial preprocessing/reading. It is also possible to apply filters afterwards by calling the ft_preprocessing function with the data as second input argument. If you want to apply different preprocessing options (such as filters for EEG channels, rectification of EMG channels, rereferencing) to different channels, you should call ft_preprocessing with the desired options for each of the channel types and subsequently append the data for the different channels types into one raw data structure.

Following the reading and channel specific preprocessing, you can identify interesting pieces of data based on the trigger codes and segment the continuous data into trials. Let's first look at the different trigger codes present in the data set:

cfg = [];
cfg.dataset             = 'subj2.vhdr';
cfg.trialdef.eventtype = '?';
dummy                   = ft_definetrial(cfg);

This will display the event types and values on screen.

evaluating trialfunction 'trialfun_general'
the following events were found in the datafile
event type: 'New Segment' with event values: 
event type: 'Response' with event values: 'R  8' 
event type: 'Stimulus' with event values: 'S  1' 'S 12' 'S 13' 'S 21' 'S 27' 'S111' 
   'S112' 'S113' 'S121' 'S122' 'S123' 'S131' 'S132' 'S133' 'S141' 'S142' 'S143'  
   'S151' 'S152' 'S153' 'S161' 'S162' 'S163' 'S171' 'S172' 'S173' 'S181' 'S182' 
   'S183' 'S211' 'S212' 'S213' 'S221' 'S222' 'S223' 'S231' 'S232' 'S233' 'S241' 
   'S242' 'S243' 
no trials have been defined yet, see FT_DEFINETRIAL for further help
found 1570 events
created 0 trials

The trigger codes S111, S121, S131, S141 correspond to the presented pictures of 4 different animals respectively. The trigger codes S151, S161, S171, S181 correspond to the presented pictures of 4 different tools. We can select the data for the animals and tools category with

cfg = [];
cfg.dataset             = 'subj2.vhdr';
cfg.trialdef.eventtype = 'Stimulus';
cfg.trialdef.eventvalue = {'S111', 'S121', 'S131', 'S141'};
cfg_vis_animal          = ft_definetrial(cfg);

cfg.trialdef.eventvalue = {'S151', 'S161', 'S171', 'S181'};
cfg_vis_tool            = ft_definetrial(cfg);

The output configuration resulting from ft_definetrial contains the trial definition as a Nx3 matrix with the begin sample, the end sample and the offset of each trial. In principle you could now use this configuration to read these segments from the original data file on disk, but since we already have the complete continuous data in memory, we'll use ft_redefinetrial to cut these trials out of the continuous data segment.

data_vis_animal = ft_redefinetrial(cfg_vis_animal, data_all);
data_vis_tool   = ft_redefinetrial(cfg_vis_tool,   data_all);

Subsequently we could do artifact detection with ft_rejectvisual to remove trials with artifacts and average the trials using ft_timelockanalysis to get the ERP, or use ft_freqanalysis to obtain an averaged time-frequency representation of the data in both conditions. If you use ft_timelockanalysis or ft_frequencyanalysis with the option cfg.keeptrials='yes', you subsequently could use ft_timelockstatistics or ft_freqstatistics for statistical comparison of the animal-tool contrast in these stimuli.

For processing of continuous data without triggers it is convenient to cut the data into constant-length pieces. This can be done while reading the data from disk, or it can be done after the complete continuous data is in memory.

The following example shows how to read and segment the data in one go.

cfg = [];
cfg.dataset              = 'subj2.vhdr';
cfg.trialfun             = 'ft_trialfun_general';
cfg.trialdef.triallength = 1;                      % duration in seconds
cfg.trialdef.ntrials     = inf;                    % number of trials, inf results in as many as possible
cfg                      = ft_definetrial(cfg);

% read the data from disk and segment it into 1-second pieces
data_seg                 = ft_preprocessing(cfg);

The following example shows how to read the data as a single continuous segment, and subsequently cut it into one second pieces.

% read it from disk as a single continuous segment
cfg = [];
cfg.dataset              = 'subj2.vhdr';
data_org                 = ft_preprocessing(cfg);

% segment it into 1-second pieces
cfg = [];
cfg.length               = 1;
data_ref                 = ft_redefinetrial(cfg, data_org);

After having finished this tutorial on preprocessing, you can continue with the tutorial on Preprocessing of EEG data and compute ERPs, with the event related averaging or with the time-frequency analysis tutorial.

If you have more questions about preprocessing, you can also read the following faq-s:

2017/09/20 17:54 Robert Oostenveld
2010/04/06 15:58 Robert Oostenveld
2009/11/11 15:59  
2013/03/17 12:22 Robert Oostenveld
2012/12/24 11:39 Robert Oostenveld
2014/10/30 17:37  
2009/02/17 15:13 Robert Oostenveld
2010/09/29 12:27 Cristiano Micheli
2011/08/25 21:39 Robert Oostenveld
2010/03/31 15:59 Robert Oostenveld
2009/06/29 12:23 Robert Oostenveld
2015/06/25 10:53 Robert Oostenveld
2014/09/25 19:55  
2009/02/17 15:15 Robert Oostenveld
2010/06/25 13:40 Jan-Mathijs Schoffelen
2009/03/25 09:23  
2009/02/17 15:17 Robert Oostenveld
2009/02/17 15:13 Robert Oostenveld
2015/03/28 14:16  
2015/02/18 15:36 Robert Oostenveld
2010/07/27 10:27 Jan-Mathijs Schoffelen
2010/12/21 21:50 Jan-Mathijs Schoffelen
2010/12/03 13:50 Robert Oostenveld

Or you can also read the example scripts:


This tutorial was last tested with version 20130617 of FieldTrip using MATLAB R2012b on a 64-bit Windows platform.