Skip to content
LING 582 (FA 2025)
GitHub

You can download the dataset via the competition page (see the assignment repo for the invitation link).

Dataset

The data is provided in CSV format with three columns and two partitions.

Files

  • train.csv - the training set
  • test.csv - the test set
  • sample_submission.csv - a sample submission file in the correct format

Columns

Summary of dataset columns.
COLUMNDESCRIPTION
IDUnique ID for this datapoint
TEXTThe text of the document. Use this to derive features/your X.
LABELThe label for this datapoint (see below)

Labels

Summary of data labels.
LABELDESCRIPTION
0Not the same author
1same author