You can download the dataset via the competition page (see the assignment repo for the invitation link).
Dataset
The data is provided in CSV format with three columns and two partitions.
Files
- train.csv - the training set
- test.csv - the test set
- sample_submission.csv - a sample submission file in the correct format
Columns
| COLUMN | DESCRIPTION |
|---|---|
ID | Unique ID for this datapoint |
TEXT | The text of the document. Use this to derive features/your X. |
LABEL | The label for this datapoint (see below) |
Labels
| LABEL | DESCRIPTION |
|---|---|
0 | Not the same author |
1 | same author |