alzheimer icon indicating copy to clipboard operation
alzheimer copied to clipboard

Cannot find '*_cookie_simb_off.csv' for BERT training.

Open JinchaoLove opened this issue 4 years ago • 0 comments

Hi, thanks for your sharing this great works. I've come across an issue that I cannot find the file 'train_cookie_simb_off.csv' and 'test_cookie_simb_off.csv' mentioned in Cookie_Bert*.py. I guess the df['text'] content is like the sentences_clean or clean in pitt-cookie-complete.csv, but when I run the line train_dataset = BERTDataset("Cookie_Text_for_finetuning.txt", tokenizer, seq_len=max_seq_length, corpus_lines=None, on_memory=True) in Cookie_Bert_Lm_finetuning.py, it got IndexError: list index out of range at the line if self.all_docs[-1] != doc: ... of BERTDataset . I'm confused about the format of df['text'] content, could you share it again?

JinchaoLove avatar Aug 25 '20 14:08 JinchaoLove