Confused EEG Classification
- Caleb Habtegebriel
- Kyser Lim
- Ola Oke
- Ziyan Prasla
- Nick Riveira
- Landon Watson
Electroencephalographic (EEG) signals measure the brain activity of a subject and can correlate with their mental state. Here, we plan to approach an open-source dataset of EEG signals taken from subjects with different levels of concentration (linked below) as a binary classification problem on whether or not they are concentrating. An added level of complexity from the class project is the time dependence of the signal being a significant factor. We plan on using different deep learning frameworks (like CNN) and other time series techniques to approach this problem. Additionally, there are some preprocessing features that are innate to EEG signals that could improve the performance of our classifier.
EEG measurements measure voltage changes that correspond to neurons firing during different behaviors. These “behavioral states” contain oscillatory patterns that can characterize different behaviors based on the power spectra of the EEG signal. For example, delta rhythm (1–3) power is generally higher during sleep and during decision making, theta (4–7 Hz) is generally more prevalent during navigation, alpha (8–11 Hz) during visual processing, and gamma during other complex cognitive tasks. For confusion, it has been suggested that there is generally an increase in theta rhythms that characterize the confused state .
Within the Dataset there were two sets of data, there was EEG data and Demographic data given for the project. The EEG Data consists of the data that tracks the confusion level of a student as they are watching a certain video. It is data collected from ten students as they watched ten video clips. As they watch the video, the wireless headset samples and predicts a confusion level every half second. This results in approximately 120+ rows per video per student. Over ten videos, each student has approximately 1200+ data points. For ten students this results in 12000+ samples.
There are multiple columns that are of importance to our model in the dataset. However, some columns that can help with getting started are the “SubjectID” and “VideoID” columns. The “SubjectID” column is an identification of each student (0–9) and the “VideoID” is an identification of each video (0–9).
Additionally, it is important to understand the target columns of this dataset. The “predefinedlabel” column is the computed value from the headset at each sample. The “userdefinedlabel” is the column that is coming from the person that is watching the video.
The “userdefinedlabel” is what we will be predicting in our model.
The demographic data given for this project contains the age, gender, and ethnicity of each student.
One very interesting element of the dataset itself is that it already contained the analyzed brain signal powers. This was interesting because it reduced the amount of preprocessing that we would need to do on the brain signals themselves and reduced the preprocessing computation at the cost of having a reduced sample size.
Given the two datasets, we first started by merging the two in order to create one big data frame. We added the gender, age, and ethnicity of each student in each given sample. This allows us to have an easier time identifying the correlations between various features since they are in one place.
To help us identify any strongly correlated features, we started by creating a correlation matrix of all features. This might be useful for us as we are doing feature analysis and engineering.
As a good practice, we converted all categorical features to numerical values. These were the “Ethnicity” and “Gender” features.
Another sanity check is checking for imbalanced datasets.
Once we have done general sanity checks, our next step was to identify general relationships between the features and the target variable (‘predefinedlabel’).
Based on the analysis, it seems like there are no strong assumptions that we can make from the data since most of the variables like Alpha, Beta, Gamma, Mediation, etc were not strongly correlated with the outcome of the sample.
For a successful model, it is important to have an effective feature engineering technique. A technique we used to narrow down our features is to use a mutual information score that is available through scikit-learn. This is a non-negative value that measures the level of dependency between a feature and a target variable. If the value is equal to zero, then the two variables are independent. We used this and scaled it up to see which features had the highest dependency with our target column.
The results show that column values like VideoID, Alpha1. Alpha2, Gamma1, Gamma2, etc have higher dependencies with the target column. We selected the top 14 features and proceeded to the next parts of the model.
Finally, we used the StandardScaler to scale our data and normalize different features.
Implementation of solutions
This section will dive deep into explaining the logic behind the implementation of our solution to this problem and the results we were able to achieve from our implementation. Multiple models were used, some gave optimal results, while others did not. We will discuss the level of improvement gained from each one and how we arrived at the final solution.
Our first model was a replication of the model used in  using the same Bi-direction LSTM architecture. Lots of our effort went into understanding this architecture and why it works on the dataset. It was very interesting to then try other architectures that seemed to work better than this on the dataset. Another Kaggle post used a more complex Bi-LSTM with more dense and dropout layers that outperformed the paper’s model. These model replications gave us a target goal to reach using our knowledge of methods developed in this class and stacking and blending other models.
When creating the model we decided to go with multiple different approaches to reach our goal of beating 75% accuracy of this Binary Classification dataset. Using the feature engineering tactics listed above we attached four different models which were a XGBoost, Keras Sequential Neural Network, Keras Sequential Neural Network + XGBoost, and a CatBoost Model. All of the models did surprisingly well and scored very high AUC and accuracy, much over our estimated hypothesis of the models’ predictions.
The XGBoost model performed very well on this dataset. We used a basic XGBoost and the initial accuracy was 89.23% and the AUC was 95.32%.
We then look at the Keras Sequential Neural Network. This network model consisted of 26 hidden layers that were used to create the learning. The model’s performance was 97.8% accuracy and 98.7% AUC.
Knowing that both of these models are high-performing, we were able to create a significant blend of the two to see if there would be a significant difference in accuracy and AUC. After running the Keras sequential model, we used this model’s predictions as features in the XGBoost training and testing data. Using this blend of the two models created a significant increase in the model's prediction. The Keras Sequential Neural Network + XGBoost performance was 99.92 accuracy and 99.95 AUC.
 E. Başar, C. Başar-Eroğlu, S. Karakaş, and M. Schürmann, “Brain oscillations in perception and memory,” International Journal of Psychophysiology, vol. 35, no. 2–3, pp. 95–124, Mar. 2000, doi: 10.1016/S0167–8760(99)00047–1.
 Z. Ni, A. C. Yuksel, X. Ni, M. I. Mandel, and L. Xie, “Confused or not Confused?: Disentangling Brain Activity from EEG Data Using Bidirectional LSTM Recurrent Neural Networks,” in Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics, Boston Massachusetts USA, Aug. 2017, pp. 241–246. doi: 10.1145/3107411.3107513.