bert-toxic-comments-multilabel
bert-toxic-comments-multilabel copied to clipboard
Wrong number of labels
The function get_labels
is used to get the labels from the source csv files, and the length of this is used to get the size of the last layer in the model. However, the first column is the ID of the document, not a label, so using this results in a size mismatch in the model, which is unable to train.
if self.labels == None: self.labels = list(pd.read_csv(os.path.join(self.data_dir, "classes.txt"),header=None)[0].values)
Removing the first value (or saying num_labels = len(labels - 1)
) fixes this problem.