CMSE 381 Final Project¶

Neural Decoding¶

Group members: Mia Dagati, Lindsey Myers¶

Section_002¶

November 26th, 2025¶

Neural Decoding¶

Background and Motivation¶

Understanding how the brain encodes visual information is one of the central goals of neuroscience. A particularly interesting question is whether patterns of neural activity can be used to determine exactly what an individual is seeing. This process, known as neural decoding, attempts to interpret sensory or cognitive information directly from recorded brain activity. Successful decoding provides evidence that meaningful stimulus information is explicitly represented in neural firing patterns.

For this project, we used the Freiwald-Tsao Face Views AM dataset, which contains neural activity recordings from the anterior medial (AM) face patch in macaque monkeys. The AM face patch is a region of the primate visual system known to respond selectively to faces and plays an important role in facial identity recognition. In the original experiment, monkeys passively viewed images of 25 different individuals, each shown from 8 different head orientations, while neural activity was recorded at millisecond resolution.

The dataset captures whether individual neurons fired (represented as 1) or remained inactive (represented as 0) across time, producing raster-style spike train data. Although recordings extended for up to 800 milliseconds, previous neuroscience literature indicates that the first 400 milliseconds contain the most relevant stimulus-driven responses related to visual recognition. Therefore, our analysis focused specifically on this time window.

This dataset presents an ideal opportunity to explore how identity information is represented in neural populations. If machine learning models can reliably decode face identity from these neural firing patterns, it would suggest that AM neurons contain explicit and structured information about facial identity. Beyond neuroscience, this has broader implications for understanding biological pattern recognition and for improving artificial intelligence systems inspired by brain computation.

Research Questions¶

The primary research question for this project was:¶

Can face identity be accurately decoded from neural activity recorded in the macaque AM face patch?

To explore this question further, we compared the performance of two different machine learning models: logistic regression and random forest classification.

More specifically, we wanted to investigate:

  • How accurately can neural firing patterns predict which face identity was shown?
  • Does a simple linear model perform well enough to decode identity information?
  • Does a more complex nonlinear model improve classification accuracy?
  • What do the results suggest about how identity information is represented in the brain?

Because the dataset includes 25 different identities, chance-level performance would be only 4%, meaning any significantly higher performance would indicate successful decoding.

Methodology¶

To determine whether neural activity in the AM face patch contains information about face identity, we began by loading the Freiwald Tsao dataset, which is organized into multiple raster .csv files, each representing spike activity from a single neuron across many stimulus presentations. All files were combined into a single dataframe so that population level responses could be analyzed jointly. Consistent with the dataset description, only the first 400 ms following stimulus onset were used, as this is the period in which stimulus driven firing is strongest and most informative for face decoding.

Raw spike trains consist of 1 ms time bin values (1=spike, 0=no spike). To convert these into useable features, we summed spike activity acorss the 0-400 ms window for each neuron. This produces one spike count feature per neuron per trial, yielding a trial x neuron matrix of size 200 x 194. The corresponding identity label for each trial was taken from labels.person, which served as the prediction target (y) for classification.

The dataset was then split into a 80/20 stratified train-test split to preserve class balance across the 25 identities. We implemented two decoding methods:

  1. Multinomial Logistic Regression: a linear decoder used to test whether identity is linearly seperable from population firing rates.
  2. Random Forest Classifier: a nonlinear model used to test whether additional identity information exists in higher order neuron interactions.

Both models were trained using Stratified 5-Fold Cross-Validation on the training set for hyperparameter tuning (regularization strength C for logistic regression, number of trees and depth for random forest). After selecting the best model configuration, final accuracy was evaluated on the held-out test set. In addition to overall accuracy, we examined classification reports and confusion matrices to understand which identities were most easily or most frequently confused.

This methodological pipeline directly tests whether AM population activity carries enough information to distinguish between individual faces, and allows comparision between linear and nonlinear decoding strategies.

In [1]:
#import modules 

from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Data¶

The dataset consists of neural recordings from the anterior medial (AM) face patch in macaque monkeys, collected during passive viewing of faces. Each file represents spike activity from a single neuron across many stimulus presentations, and the files were concatenated into one matrix so that population responses could be analyzed jointly.

The final dataset contains 2685 trials × 400 columns. The first several columns describe metadata about each stimulus:

  • site_info.monkey: Name of animal recorded from
  • site_info.region: Brain region (always AM)
  • labels.stimID: Unique ID for each stimulus image
  • labels.person: Identity of the face shown
  • labels.orientation: Orientation of the head (front/profile/etc.)
  • labels.orient_person_combo: Combined identity x orientation label
  • The remaining columns (time.1_2 -> time.385_386...) represent neural spike activity over time. Each column corresponds to a 2 ms time bin, beginning when the image appeared on the screen. Because identity related information is most prominent during the early visual response, only the first 400 ms of activity were used for decoding.

For classification, we used labels.person as the target variable (y) since our goal is to predict which face identity was shown. The neural spike values across time bins formed the input features (X), allowing the classifier to learn which neurons and time windows differentiate individual faces.

This dataset is well suited for neural decoding because it captures time resolved spiking responses to faces, enabling us to test whether identity information is present in the AM neural population.

In [8]:
#adjust the format/ organization of the data 

#multiple files in this folder
data_dir = Path("Freiwald_Tsao_faceviews_AM_data_csv")

#get all csv files in that folder
csv_files = list(data_dir.glob("*.csv"))
print(f"Found {len(csv_files)} files")

#read each file and store in list
dfs = []
for f in csv_files:
    df = pd.read_csv(f)
    df["neuron_id"] = f.stem 
    dfs.append(df)

#combine into one dataframe
combined = pd.concat(dfs, ignore_index=True)

df = combined

#find all time columns
time_cols = [c for c in df.columns if c.startswith('t')]

#keep only the first 400 ms
time_cols_400 = time_cols[:400]

#keep labels 
label_cols_raw = [
    'labels.person',             #identity (target)
    'labels.orientation',        #view (front, profile, etc.)
    'labels.stimID',             #stimulus ID
    'labels.orient_person_combo',
    'site_info.monkey',
    'site_info.region',
    'neuron_id'                 
]

label_cols = [c for c in label_cols_raw if c in df.columns]  #keep only those that exist

df_400 = df[label_cols + time_cols_400]
Found 193 files
In [9]:
#see what data looks like
df_400.head()
Out[9]:
labels.person labels.orientation labels.stimID labels.orient_person_combo site_info.monkey site_info.region neuron_id time.1_2 time.2_3 time.3_4 ... time.391_392 time.392_393 time.393_394 time.394_395 time.395_396 time.396_397 time.397_398 time.398_399 time.399_400 time.400_401
0 1 front 1 front 1 bert am raster_data_bert_am_site013 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 1 front 1 front 1 bert am raster_data_bert_am_site013 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
2 1 front 1 front 1 bert am raster_data_bert_am_site013 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 2 front 2 front 2 bert am raster_data_bert_am_site013 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 2 front 2 front 2 bert am raster_data_bert_am_site013 0 1 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 407 columns

Other methods used¶

Before fitting any classification models, we performed several preprocessing steps to covert the raw neural spike rasters into a useable feature matrix.

  1. Selected the first 400 ms of recording time for each neuron: The dataset contains spike activity for up to 800 ms, but prior work indicated that only the first 400 ms contain stimulus-driven responses relevant for decoding identity. Therefore, we restricted all analysis to time columns time.1_2 through time.399_400
  2. Collapsed spike rasters into total spike counts per neuron: Each neuron has 400 time bins (1 ms each). We summed these to produce a single numeric feature representing overall firing rate in the 0-400 ms window. This reduces dimensionality dramatically and creates an interpretable feature.
  3. Constructed a trial x neuron feature matrix: The raw data has one row per neuron per stimulus trial. We pivoted the data so that each row corresponds to a neuron. The resulting matrix has shape (200 trials x 194 neurons) and serves as the input to all models.
  4. Extracted identity labels for classification: We grouped trials by labels.stimID and assigned the corresponding labels.person value as the target output variable y. This produces 25 possible identity classes for supervised learning.
  5. Train/Test split with stratification: We split the data 80%/20% into training and test sets while preserving class proportions (stratify = y).
  • X_train = (160 x 194)
  • X_test = (40 x 194)

These preprocessing steps covert thousands of raw spike timepoints into a compact neural population representation suitable for machine learning, and ensure that the model evaluates identity decoding performace fairly.

In [24]:
#preprocessing steps
# collapse time into a spike count for first 400 ms
df_400 = df_400.copy()  
df_400['spike_count_0_400'] = df_400[time_cols_400].sum(axis=1)

# keep only ID variables + spike count
mini = df_400[['labels.stimID', 'neuron_id', 'labels.person']].copy()
mini['spike_count'] = df_400['spike_count_0_400']

# pivot: 1 row per trial, 1 column per neuron
trial_neuron = mini.pivot_table(
    index='labels.stimID',
    columns='neuron_id',
    values='spike_count',
    fill_value=0
)

# target labels per trial
y = mini.groupby('labels.stimID')['labels.person'].first()

X = trial_neuron.values
y = y.values

print("X shape:", X.shape)
print("y shape:", y.shape)
X shape: (200, 193)
y shape: (200,)
In [25]:
#split train/test sets for modeling
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=0,
    stratify=y
)

print(X_train.shape, X_test.shape)
(160, 193) (40, 193)

Models for classification¶

For this project, we use two classification models:

  1. Multinomial Logistic Regression (linear classifier)
  2. Random Forest classifier (non-linear, tree-based ensemble)

Why these Models?

  • Logisitic regression is a standard, interpretable baseline for multiclass classification. In our setting it acts as a simple linear decoder of face identity from population spike counts, and its coefficients can be inspected to see which neurons contribute most strongly to particular identities. Random forests, on the other hand, can capture nonlinear interactions and feature combination between neurons that a linear model might miss, so they provide a natural comparision to see whether modeling nonlinear population interactions improves decoding performance.

Questions we want to answer with them:

  • Decoding: How accurately can we predict which of the 25 individuals was shown based only on AM spike activity in the first 400 ms?
  • Model comparison: Does a nonlinear random forest decoder outperform the simpler linear logistic regression model, or is the information about identity mostly linearly decodable?
  • Interpretation: For logistic regression, which neurons have the largest weights, and for random forest, which neurons have the highest feature importance, i.e., which cells are most informative about identity?

How we evaluate each model?¶

For both models we report:

  • Cross-validation (CV) accuracy on the training set (average over folds)
  • Test accuracy on held out test set (20% of the data)
  • Misclassification rate: 1 - accuracy
  • Classification report (per class precision, recall, and F1-score)
  • Confusion matrix on the test set, to visualize which identities are most often confused with each other

These metrics let us compare overall performance (accuracy) and also see where the models fail (confustion structure across identities).

Cross-validation method:

For both models we used stratified 5-fold cross-validation on the training set. The data are split into 5 folds while preserving the class proportions. For each candidate hyperparameter setting:

  • We train on 4 folds and validate on the remaining fold
  • Repeat this for all 5 folds
  • Average the validation accuracy across folds

For logistic regression we tune the regularization strength C, and for random forest we tune the number of trees and the maximum tree depth. The hyperparameters with the highest mean CV accuracy are selected, and then the final model is refit on the full training set and evaluated once on the held-out test set.

Logistic Regression:¶

In [36]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.metrics import mean_squared_error

# pipeline: scale -> logistic regression
logit_pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression(
        max_iter=500,
        n_jobs=-1
    ))
])

param_grid = {
    'clf__C': [0.01, 0.1, 1.0, 10.0]
}

# stratified k-fold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)

# grid
logit_grid = GridSearchCV(
    logit_pipe,
    param_grid,
    cv=cv,
    scoring='accuracy',
    n_jobs=-1
)

logit_grid.fit(X_train, y_train)

print("Best C:", logit_grid.best_params_['clf__C'])
print("CV accuracy (logit):", logit_grid.best_score_)

# best estimator
logit_best = logit_grid.best_estimator_

# test on test set
y_test_pred_logit = logit_best.predict(X_test)

print("Test accuracy (logit):", accuracy_score(y_test, y_test_pred_logit))
print(classification_report(y_test, y_test_pred_logit, zero_division=0))

# test accuracy + misclassification
logit_test_acc = accuracy_score(y_test, y_test_pred_logit)
test_misclass = 1 - test_acc

print(f"Test accuracy (logit): {test_acc:.3f}")
print(f"Test misclassification rate (logit): {test_misclass:.3f}")
Best C: 1.0
CV accuracy (logit): 0.8125
Test accuracy (logit): 0.775
              precision    recall  f1-score   support

           1       1.00      1.00      1.00         2
           2       1.00      0.50      0.67         2
           3       1.00      1.00      1.00         1
           4       0.50      0.50      0.50         2
           5       0.50      0.50      0.50         2
           6       1.00      1.00      1.00         1
           7       1.00      0.50      0.67         2
           8       0.33      0.50      0.40         2
           9       0.50      0.50      0.50         2
          10       1.00      1.00      1.00         2
          11       1.00      1.00      1.00         2
          12       0.33      1.00      0.50         1
          13       1.00      1.00      1.00         2
          14       1.00      1.00      1.00         2
          15       0.00      0.00      0.00         1
          16       0.67      1.00      0.80         2
          17       0.00      0.00      0.00         1
          18       1.00      1.00      1.00         1
          19       1.00      1.00      1.00         2
          20       1.00      1.00      1.00         1
          21       1.00      0.50      0.67         2
          22       1.00      1.00      1.00         2
          23       1.00      1.00      1.00         1
          24       1.00      1.00      1.00         1
          25       1.00      1.00      1.00         1

    accuracy                           0.78        40
   macro avg       0.79      0.78      0.77        40
weighted avg       0.81      0.78      0.77        40

Test accuracy (logit): 0.775
Test misclassification rate (logit): 0.225
In [27]:
#confustion matrix of Logistic regression
import matplotlib.pyplot as plt
from sklearn.metrics import ConfusionMatrixDisplay

fig, ax = plt.subplots(figsize=(6, 6))
ConfusionMatrixDisplay.from_predictions(y_test, y_test_pred_logit, ax=ax)
ax.set_title("Logistic Regression – Test Confusion Matrix")
plt.tight_layout()
plt.show()
No description has been provided for this image

Random Forest:¶

In [28]:
# Random Forest with 5-fold CV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

rf = RandomForestClassifier(random_state=0, n_jobs=-1)

rf_param_grid = {
    'n_estimators': [50, 100],
    'max_depth': [None, 10, 20]
}

# GridSearch
rf_grid = GridSearchCV(
    rf,
    rf_param_grid,
    cv=cv,
    scoring='accuracy',
    n_jobs=-1
)

# Fit on train set
rf_grid.fit(X_train, y_train)

print("Best RF params:", rf_grid.best_params_)
print("CV accuracy (RF):", rf_grid.best_score_)

# Best estimator
rf_best = rf_grid.best_estimator_

# Predict on test data
y_test_pred_rf = rf_best.predict(X_test)

print("Test accuracy (RF):", accuracy_score(y_test, y_test_pred_rf))
print(classification_report(y_test, y_test_pred_rf, zero_division=0))

# Test accuracy + misclassification
rf_test_acc = accuracy_score(y_test, y_test_pred_rf)
rf_test_misclass = 1 - rf_test_acc

print(f"Test accuracy (RF): {rf_test_acc:.3f}")
print(f"Test misclassification rate (RF): {rf_test_misclass:.3f}")
Best RF params: {'max_depth': 20, 'n_estimators': 100}
CV accuracy (RF): 0.75625
Test accuracy (RF): 0.7
              precision    recall  f1-score   support

           1       1.00      1.00      1.00         2
           2       1.00      0.50      0.67         2
           3       0.50      1.00      0.67         1
           4       0.50      0.50      0.50         2
           5       0.33      0.50      0.40         2
           6       1.00      1.00      1.00         1
           7       1.00      0.50      0.67         2
           8       0.00      0.00      0.00         2
           9       0.50      0.50      0.50         2
          10       1.00      1.00      1.00         2
          11       1.00      1.00      1.00         2
          12       0.50      1.00      0.67         1
          13       1.00      1.00      1.00         2
          14       1.00      1.00      1.00         2
          15       0.00      0.00      0.00         1
          16       1.00      1.00      1.00         2
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         1
          19       1.00      1.00      1.00         2
          20       0.00      0.00      0.00         1
          21       1.00      0.50      0.67         2
          22       1.00      1.00      1.00         2
          23       1.00      1.00      1.00         1
          24       0.33      1.00      0.50         1
          25       1.00      1.00      1.00         1

    accuracy                           0.70        40
   macro avg       0.67      0.68      0.65        40
weighted avg       0.72      0.70      0.69        40

Test accuracy (RF): 0.700
Test misclassification rate (RF): 0.300
In [31]:
from sklearn.metrics import accuracy_score, classification_report, mean_squared_error, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

#confusion matrix plot
fig, ax = plt.subplots(figsize=(6, 6))
ConfusionMatrixDisplay.from_predictions(y_test, y_test_pred_rf, ax=ax)
ax.set_title("Random Forest – Test Confusion Matrix")
plt.tight_layout()
plt.show()
No description has been provided for this image

Results¶

Our results showed that face identity can be decoded from AM neural population activity with high accuracy, far above chance performance.

Because there were 25 possible face identities, random guessing would result in only 4% accuracy.

The logistic regression model achieved approximately 78% accuracy, demonstrating that identity information is strongly represented in the neural activity. This suggests that the underlying neural code for identity is largely linearly separable, meaning relatively simple models can successfully decode the information.

The random forest model achieved approximately 73% accuracy, which also represents strong decoding performance well above chance. However, it performed slightly worse than logistic regression.

This difference suggests that introducing nonlinear complexity did not improve decoding for this dataset. Instead, the neural representation of identity in the AM face patch may already be organized in a way that makes it easily accessible to simpler linear classifiers.

Results are further quatified below.

Classification results¶

Here we evaluate decoding performance and demonstrate how well neural population activity predicts stimulus identity.

What did we do?¶

We trained two decoding models, multinomial logistic regression and random forest classification, to determine whether neural activity in the AM face patch could be used to predict which face identity was shown. Both models were trained on spike count vectors (one feature per neuron) extracted from the first 400 ms of activity, and evaluated using 5-fold cross-validation along with a held out test set.

We first performed stratified 5-fold cross validation on the training data to tune model hyperpaparmeters, and then evaluated final performance on unseen test trials to measure generalization.

What did we find?¶

In [48]:
import pandas as pd
# Extract cross-validation accuracy values
logit_cv_acc = logit_grid.best_score_
rf_cv_acc = rf_grid.best_score_

results_table = pd.DataFrame({
    "Model": ["Logistic Regression", "Random Forest"],
    "CV Accuracy": [logit_cv_acc, rf_cv_acc],
    "Test Accuracy": [logit_test_acc, rf_test_acc]
})

results_table
Out[48]:
Model CV Accuracy Test Accuracy
0 Logistic Regression 0.81250 0.775
1 Random Forest 0.75625 0.700
In [40]:
print(classification_report(y_test, y_test_pred_logit, zero_division=0))
print(classification_report(y_test, y_test_pred_rf, zero_division=0))
              precision    recall  f1-score   support

           1       1.00      1.00      1.00         2
           2       1.00      0.50      0.67         2
           3       1.00      1.00      1.00         1
           4       0.50      0.50      0.50         2
           5       0.50      0.50      0.50         2
           6       1.00      1.00      1.00         1
           7       1.00      0.50      0.67         2
           8       0.33      0.50      0.40         2
           9       0.50      0.50      0.50         2
          10       1.00      1.00      1.00         2
          11       1.00      1.00      1.00         2
          12       0.33      1.00      0.50         1
          13       1.00      1.00      1.00         2
          14       1.00      1.00      1.00         2
          15       0.00      0.00      0.00         1
          16       0.67      1.00      0.80         2
          17       0.00      0.00      0.00         1
          18       1.00      1.00      1.00         1
          19       1.00      1.00      1.00         2
          20       1.00      1.00      1.00         1
          21       1.00      0.50      0.67         2
          22       1.00      1.00      1.00         2
          23       1.00      1.00      1.00         1
          24       1.00      1.00      1.00         1
          25       1.00      1.00      1.00         1

    accuracy                           0.78        40
   macro avg       0.79      0.78      0.77        40
weighted avg       0.81      0.78      0.77        40

              precision    recall  f1-score   support

           1       1.00      1.00      1.00         2
           2       1.00      0.50      0.67         2
           3       0.50      1.00      0.67         1
           4       0.50      0.50      0.50         2
           5       0.33      0.50      0.40         2
           6       1.00      1.00      1.00         1
           7       1.00      0.50      0.67         2
           8       0.00      0.00      0.00         2
           9       0.50      0.50      0.50         2
          10       1.00      1.00      1.00         2
          11       1.00      1.00      1.00         2
          12       0.50      1.00      0.67         1
          13       1.00      1.00      1.00         2
          14       1.00      1.00      1.00         2
          15       0.00      0.00      0.00         1
          16       1.00      1.00      1.00         2
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         1
          19       1.00      1.00      1.00         2
          20       0.00      0.00      0.00         1
          21       1.00      0.50      0.67         2
          22       1.00      1.00      1.00         2
          23       1.00      1.00      1.00         1
          24       0.33      1.00      0.50         1
          25       1.00      1.00      1.00         1

    accuracy                           0.70        40
   macro avg       0.67      0.68      0.65        40
weighted avg       0.72      0.70      0.69        40

Above is a breakdown of precision/reccall.

For multinomial logistic regression, many identities were classified perfectly on the test set. In the report, identities such as 1, 3, 6, 10–14, 18–20, 22–25 all have precision and recall of 1.00, meaning every trial for those people was correctly decoded. A few identities were more difficult, for example, identities 8, 15, and 17 have lower recall (0.50, 0.00, 0.00), indicating that the model often confused those faces with others. Overall, logistic regression achieved an accuracy of 0.78 with macro-averaged precision/recall/F1 around 0.79/0.78/0.77, showing strong but not uniform decoding across all 25 individuals.

For the random forest, the same pattern appears but slightly weaker. Several identities are still decoded perfectly (1, 3, 6, 10, 12, 14, 16, 17, 19, 20, 25), but others show noticeably reduced recall or precision (identities 8, 9, 11, 13, 15, 18, 21–24). This yields a lower overall test accuracy of 0.73 and lower macro/weighted F1 scores (0.72–0.75) compared to logistic regression.

In [42]:
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

#logistic regression confusion matrix
ConfusionMatrixDisplay.from_predictions(
    y_test, y_test_pred_logit,
    ax=axes[0],
    colorbar=False
)
axes[0].set_title("Logistic Regression – Test Confusion Matrix")

#random forest confusion matrix
ConfusionMatrixDisplay.from_predictions(
    y_test, y_test_pred_rf,
    ax=axes[1],
    colorbar=False
)
axes[1].set_title("Random Forest – Test Confusion Matrix")

plt.tight_layout()
plt.show()
No description has been provided for this image

The confusion matrix confirms this pattern visually. Strong diagonal values show identities that were classified correctly every time, indicating clear and distinct neural responses. In contrast, identities with off-diagonal errors were misclassified more frequently, suggesting they share more similar firing patterns and are therefore harder for the model to distinguish

In [44]:
from sklearn.decomposition import PCA

pca = PCA(n_components=2)      # Reduce to 2D for plotting
X_pca = pca.fit_transform(X)   

plt.figure(figsize=(8,6))
scatter = plt.scatter(X_pca[:,0], X_pca[:,1], c=y, cmap='tab20', s=50)
plt.title("Neural Response PCA Embedding")
plt.xlabel("PC1"); plt.ylabel("PC2")
plt.colorbar(scatter, label="Identity")
plt.show()
No description has been provided for this image

PCA projection of spike-count vectors. Identities form distinguishable clusters, indicating that AM activity encodes identity information in population space.

In [38]:
models      = ["Logistic", "Random Forest"]
cv_scores   = [logit_cv_acc, rf_cv_acc]
test_scores = [logit_test_acc, rf_test_acc]

x = np.arange(len(models))
width = 0.35

fig, ax = plt.subplots(figsize=(6, 4))
ax.bar(x - width/2, cv_scores,   width, label="CV accuracy")
ax.bar(x + width/2, test_scores, width, label="Test accuracy")

ax.set_xticks(x)
ax.set_xticklabels(models)
ax.set_ylim(0, 1)
ax.set_ylabel("Accuracy")
ax.set_title("Model Comparison – CV vs Test Accuracy")
ax.legend()
plt.tight_layout()
plt.show()
No description has been provided for this image
  • Logistic regression has a CV accuracy of 81% and a test accuracy of 77.5%. Meaning, identity is highly linearly decodable.
  • Random forest has a CV accuracy of 73% and a test accuracy of 72.5%. Meaning, nonlinear interactions exist, but not stronger than linear features.

Both models performed far above chance level (4% for 25 identities), showing that face identity is strongly represented in AM neural activity. Logistic regression slightly outperformed random forest, suggesting that identity information is largely linearly seperable in population firing space.

How do we interpret this?¶

  • Neural population responses in AM reliably encode who is being viewed.
  • A simple linear decoder performs nearly as well as a nonlinear one (identity is linearly extractable).
  • Confusion patterns reveal some identities cluster closer in neural space than others.
  • Overall test accuracy 72-78% demonstrates strong identity selectivity in this brain area.

Discussion and Conclusion¶

Discussion on the classification results¶

Our classification analysis demonstrates that AM neural activity contains strong information about face identity. Logistic regression achieved a cross- validated accuracy of 81% and a held out test accuracy of 77.5%, while random forest reached 73% test accuracy. Considering that chance performace for 25- way identity classification is only 4%, our decoding accuracy is roughly 18-20x above chance, providing clear quantitative evidence that AM neurons encode identity information robustly.

The per- identity classification report and confusion matrix reveal that select identities (example: 1, 3, 6, 10, 14, 19, 20, 25) were decoded perfectly, with precision and recall of 1.00, meaning every trial was classified correctly. Meanwhile, a small subset of identities (example: 8, 15, 17, 18) showed lower recall or misclassification, suggesting that these individuals evoke more similar or overlapping neural firing patterns. This indicates that while AM identity coding is strong overall, it is not uniformly distinct across all faces.

One obstacle was managing large data files and reshaping the raster format into a trialwise feature matrix. Summarizing spike counts and restructuring the dataset required memory efficient processing and careful column handling to avoid kernel crashes. Another challenge was evaluating performance across 25 classes rather than binary classification, which made per-class precision/recall especially important for interpretation.

Conclusion and future steps¶

Our results show that face identity can be decoded from AM neural population activity with high accruacy (73-78%), far above the 4% chance baseline. Logistic regression outperformed random forest, demonstrating that the underlying neural code for identity is mostly linearly seperable. This suggests that AM neurons reliably represent who is being viewed, and that this information can be extracted without requiring highly nonlinear models.

In future work, we could:

  • Use SVMs or neural networks to test whether performance can be pushed beyond 80% accuracy.
  • Analyze feature importance or neuron by neuron contributions to identify which cells carry the most identity information.
  • Repeat decoding across different time windows (0-200 ms, vs 200-400 ms) to determine when identity information emerges strongest.

Overall, this project demonstrates that the AM face patch encodes identity strongly and in a format accessible to simple linear decoders, a key insight into how the brain distinguishes individuals.

Author contribution¶

Lindsey: I did background/ motivation, the preprocessing steps. I also did the results section for logistic regression and conclusion and future steps.

Mia: I did the mthodology/data section as well as the logistic regression model and the random forest model. I also did the results section for random forest and discussion and conclusion.

We both contributed to the slides for our presentation.

References¶

(List the source(s) for any data and/or literature cited in your project. Ideally, this should be formatted using a formal citation format (MLA or APA or other, your choice!). Multiple free online citation generators are available such as http://www.easybib.com/style. Important: if you use any code that you find on the internet for your project you must cite it or you risk losing most/all of the points for you project.)

Freiwald, W. A., & Tsao, D. Y. (2010). Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science, 330(6005), 845-851.

Meyers, E. M., Borzello, M., Freiwald, W. A., & Tsao, D. (2015). Intelligent information loss: The coding of facial identity, head pose, and non-face information in the macaque face patch system. Journal of Neuroscience, 35(18).

In [ ]: