PreSumm icon indicating copy to clipboard operation
PreSumm copied to clipboard

No such file or directory: '../bert_data.train.pt'

Open kurodenjiro opened this issue 5 years ago • 14 comments

File "train.py", line 122, in train_abs(args, device_id) File "E:\project\PreSumm\src\train_abstractive.py", line 273, in train_abs train_abs_single(args, device_id) File "E:\project\PreSumm\src\train_abstractive.py", line 334, in train_abs_single trainer.train(train_iter_fct, args.train_steps) File "E:\project\PreSumm\src\models\trainer.py", line 133, in train train_iter = train_iter_fct() File "E:\project\PreSumm\src\train_abstractive.py", line 313, in train_iter_fct shuffle=True, is_test=False) File "E:\project\PreSumm\src\models\data_loader.py", line 136, in init self.cur_iter = self._next_dataset_iterator(datasets) File "E:\project\PreSumm\src\models\data_loader.py", line 156, in _next_dataset_iterator self.cur_dataset = next(dataset_iter) File "E:\project\PreSumm\src\models\data_loader.py", line 94, in load_dataset yield _lazy_dataset_loader(pt, corpus_type) File "E:\project\PreSumm\src\models\data_loader.py", line 78, in _lazy_dataset_loader dataset = torch.load(pt_file) File "C:\Users\Admin\Anaconda3\lib\site-packages\torch\serialization.py", line 419, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: '../bert_data.train.pt'

i dont know why , in folder bert_data only have cnndm.test.0.bert.pt , not train.pt How to fix it ?

kurodenjiro avatar Nov 26 '19 10:11 kurodenjiro

can you provide the command you use , also I think you should check the directory were you store your processed data

cuthbertjohnkarawa avatar Nov 27 '19 02:11 cuthbertjohnkarawa

Did you use -bert_data_path ../bert_data ?

-bert_data_path option should also contain the prefix of your data.


So instead of using -bert_data_path ../bert_data, try :

-bert_data_path ../bert_data/cnndm

astariul avatar Nov 27 '19 04:11 astariul

Can you try this ?

add two lines to format_to_bert function in data_builder.py (line 276)

      if not os.path.isdir(args.save_path):
          os.mkdir(args.save_path)

robinsongh381 avatar Nov 27 '19 05:11 robinsongh381

My command is python3 train.py -task ext -mode train -bert_data_path ../bert_data/bert_data_cnndm_final/ -ext_dropout 0.1 -model_path ../models -lr 2e-3 -visible_gpus -1 -report_every 50 -save_checkpoint_steps 1000 -batch_size 3000 -train_steps 50000 -accum_count 2 -log_file ../logs/ext_bert_cnndm -use_interval true -warmup_steps 10000 -max_pos 512

I have just downloaded the dataset file named bert_data_cnndm_final.zip and unzipped into ./bert_data/bert_data_cnndm_final. I still see the error:

No such file or directory: '../bert_data/bert_data_cnndm_final/.train.pt'

Any idea? Thanks.

JackXueIndiana avatar Feb 25 '20 20:02 JackXueIndiana

For the testing purpose, I keep only one training file in -bert_data_path and name it as .train.pt the command runs without any problem (but only have about 2k examples in it).

JackXueIndiana avatar Feb 25 '20 23:02 JackXueIndiana

For the testing purpose, I keep only one training file in -bert_data_path and name it as .train.pt the command runs without any problem (but only have about 2k examples in it).

You should rewrite the line 84 in data_loader.py to something like: args.bert_data_path + 'cnndm.' + corpus_type + '.[0-9]*.bert.pt'. This works for me on xsum data: args.bert_data_path + 'xsum.' + corpus_type + '.[0-9]*.bert.pt'

LuJunru avatar Feb 26 '20 10:02 LuJunru

LuJunru, you are right. I changed line 84 in data_loader.py to pts = sorted(glob.glob(args.bert_data_path + '/[a-z].' + corpus_type + '.[0-9].bert.pt')) Thanks for your hint.

JackXueIndiana avatar Feb 28 '20 00:02 JackXueIndiana

Somehow the copy-and paste removed "" in my code. This change was I changed line 84 in data_loader.py to pts = sorted(glob.glob(args.bert_data_path + '/[a-z].' + corpus_type + '.[0-9]*.bert.pt'))

JackXueIndiana avatar Mar 04 '20 04:03 JackXueIndiana

@JackXueIndiana, I made a minor fix in your code: pts = sorted(glob.glob(args.bert_data_path + '/[a-z]*.' + corpus_type + '.[0-9]*.bert.pt')) It works properly this way.

onrmrt avatar Mar 23 '20 09:03 onrmrt

Even after applying the fixes above I've yet to get it to run...my data is in the ~/PreSumm/bert_data/bert_data_cnndm_final directory, and yet I still get No such file or directory: '/home/mmcmahon/PreSumm/bert_data/bert_data_cnndm_final/cnndm.test.pt'

mmcmahon13 avatar Apr 08 '20 20:04 mmcmahon13

Same as me all the fixex above didnt work for me

File "src/train.py", line 122, in train_abs(args, device_id) File "C:\Users\Ghani\Desktop\PreSumm\src\train_abstractive.py", line 273, in train_abs train_abs_single(args, device_id) File "C:\Users\Ghani\Desktop\PreSumm\src\train_abstractive.py", line 334, in train_abs_single trainer.train(train_iter_fct, args.train_steps) File "C:\Users\Ghani\Desktop\PreSumm\src\models\trainer.py", line 133, in train train_iter = train_iter_fct() File "C:\Users\Ghani\Desktop\PreSumm\src\train_abstractive.py", line 313, in train_iter_fct shuffle=True, is_test=False) File "C:\Users\Ghani\Desktop\PreSumm\src\models\data_loader.py", line 136, in init self.cur_iter = self._next_dataset_iterator(datasets) File "C:\Users\Ghani\Desktop\PreSumm\src\models\data_loader.py", line 156, in _next_dataset_iterator self.cur_dataset = next(dataset_iter) File "C:\Users\Ghani\Desktop\PreSumm\src\models\data_loader.py", line 94, in load_dataset yield _lazy_dataset_loader(pt, corpus_type) File "C:\Users\Ghani\Desktop\PreSumm\src\models\data_loader.py", line 78, in _lazy_dataset_loader dataset = torch.load(pt_file) File "C:\Users\Ghani\Anaconda3\lib\site-packages\torch\serialization.py", line 381, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'bert_data/news.train.pt'

Ghani-25 avatar Apr 13 '20 14:04 Ghani-25

Same for me, I tried to run the repo for GPT-2 detecting model, but it threw me such an error:

Loading checkpoint from detector-base.pt Traceback (most recent call last): File "C:\Users\xinxin\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\xinxin\Anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\xinxin\Documents\gpt-2-output-dataset\detector\server.py", line 120, in fire.Fire(main) File "C:\Users\xinxin\Anaconda3\lib\site-packages\fire\core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\xinxin\Anaconda3\lib\site-packages\fire\core.py", line 468, in _Fire target=component.name) File "C:\Users\xinxin\Anaconda3\lib\site-packages\fire\core.py", line 672, in _CallAnd UpdateTrace component = fn(*varargs, **kwargs) File "C:\Users\xinxin\Documents\gpt-2-output-dataset\detector\server.py", line 83, in main data = torch.load(checkpoint, map_location='cpu') File "C:\Users\xinxin\Anaconda3\lib\site-packages\torch\serialization.py", line 525, i n load with _open_file_like(f, 'rb') as opened_file: File "C:\Users\xinxin\Anaconda3\lib\site-packages\torch\serialization.py", line 212, i n _open_file_like return _open_file(name_or_buffer, mode) File "C:\Users\xinxin\Anaconda3\lib\site-packages\torch\serialization.py", line 193, i n init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'detector-base.pt'

Xinxin-Lai avatar May 30 '20 16:05 Xinxin-Lai

^ same issue as @mmcmahon13 and @Ghani-25

germanenik avatar Feb 28 '21 01:02 germanenik

@JackXueIndiana, I made a minor fix in your code: pts = sorted(glob.glob(args.bert_data_path + '/[a-z]*.' + corpus_type + '.[0-9]*.bert.pt')) It works properly this way.

Thanks it worked

kush-2418 avatar Jul 11 '22 11:07 kush-2418