You can download the dataset via the competition page (see the assignment repo for the invitation link).

Dataset

The data is provided in CSV format with three columns and two partitions.

Files

Summary of dataset columns.
COLUMN	DESCRIPTION
`ID`	Unique ID for this datapoint
`TEXT`	The text of the document. Use this to derive features/your X.
`LABEL`	The label for this datapoint (see below)

Summary of data labels.
LABEL	DESCRIPTION
`0`	Not the same author
`1`	same author