alzheimer
alzheimer copied to clipboard
Cannot find '*_cookie_simb_off.csv' for BERT training.
Hi, thanks for your sharing this great works. I've come across an issue that I cannot find the file 'train_cookie_simb_off.csv' and 'test_cookie_simb_off.csv' mentioned in Cookie_Bert*.py
.
I guess the df['text']
content is like the sentences_clean
or clean
in pitt-cookie-complete.csv
, but when I run the line train_dataset = BERTDataset("Cookie_Text_for_finetuning.txt", tokenizer, seq_len=max_seq_length, corpus_lines=None, on_memory=True)
in Cookie_Bert_Lm_finetuning.py
, it got IndexError: list index out of range
at the line if self.all_docs[-1] != doc: ...
of BERTDataset
.
I'm confused about the format of df['text']
content, could you share it again?