fast-bert
fast-bert copied to clipboard
DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False
Hello, I am quite new on the topic, sorry if it's a false issue.
When loading with BertDataBunch, I got this warning:
lib/python3.9/site-packages/fast_bert/data_cls.py:231: DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False.
data_df = pd.read_csv(os.path.join(self.data_dir, filename))
I already have this sort of issue with panda in my code, but with BertDataBunch I can't find a way to set dtype option ? Installed fast-bert yesterday, so latest version I guess
databunch = BertDataBunch(DATA_PATH, LABEL_PATH,
tokenizer='camembert-base',
train_file='train_set.csv',
val_file='val_set.csv',
label_file='labels.txt',
text_col='source_clean',
label_col=['aaa', 'bbb', 'ccc','ddd', 'eee'],
batch_size_per_gpu=16,
max_seq_length=512,
multi_gpu=False,
multi_label=True,
model_type='camembert-base')
Second warning during same run on another line (248):
lib/python3.9/site-packages/fast_bert/data_cls.py:248: DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False.
data_df = pd.read_csv(os.path.join(self.data_dir, filename))
this is related to the format of your datafiles, which can lead to issues when importing a CSV via a pandas dataframe. I might submit a pull request to allow xlsx files instead, since these have better handling for rows/columns, but for now one workaround is to ensure all your text in a CSV is surrounded by double quotes: "