BioGPT icon indicating copy to clipboard operation
BioGPT copied to clipboard

colab: preprocessing (RE-DDI)

Open runfish5 opened this issue 2 years ago • 3 comments

Please see the created public github gist , I try to run !bash preprocess.sh from within %cd /content/BioGPT/examples/RE-DDI.

I followed the instructions of this respository's README.md, but I the running the !bash preprocess.sh shows a very strange output:

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-preprocess", line 8, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 389, in cli_main
    main(args)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 372, in main
    _make_all(args.source_lang, src_dict, args)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 185, in _make_all
    _make_dataset(
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 178, in _make_dataset
    _make_binary_dataset(
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 119, in _make_binary_dataset
    final_summary = FileBinarizer.multiprocess_dataset(
  File "/usr/local/lib/python3.8/dist-packages/fairseq/binarizer.py", line 100, in multiprocess_dataset
    offsets = find_offsets(input_file, num_workers)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/file_chunker_utils.py", line 25, in find_offsets
    with open(filename, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../../data/DDI/raw/relis_train.tok.bpe.x'
expand error

Following PMID in ../../data/DDI/raw/train.json has no extracted triples:
DDI-DrugBank.d519 DDI-MedLine.d18 DDI-DrugBank.d491 DDI-MedLine.d4 DDI-DrugBank.d134 DDI-DrugBank.d230 DDI-DrugBank.d259 DDI-DrugBank.d293 DDI-MedLine.d64 DDI-MedLine.d100 DDI-DrugBank.d295 DDI-DrugBank.d402 DDI-MedLine.d101 DDI-DrugBank.d190 DDI-MedLine.d140 DDI-MedLine.d112 DDI-MedLine.d9 DDI-DrugBank.d301 DDI-DrugBank.d128 DDI-DrugBank.d101 DDI-DrugBank.d28 DDI-DrugBank.d376 DDI-MedLine.d28 DDI-DrugBank.d93 DDI-MedLine.d88 DDI-DrugBank.d539 DDI-DrugBank.d525 DDI-DrugBank.d540 DDI-DrugBank.d461 DDI-MedLine.d132 DDI-DrugBank.d360 DDI-MedLine.d43 DDI-MedLine.d121 DDI-DrugBank.d262 DDI-DrugBank.d164 DDI-DrugBank.d534 DDI-DrugBank.d385 DDI-DrugBank.d408 DDI-MedLine.d96 DDI-DrugBank.d285 DDI-DrugBank.d473 DDI-MedLine.d57 DDI-DrugBank.d557 DDI-DrugBank.d161 DDI-DrugBank.d24 DDI-DrugBank.d67 DDI-DrugBank.d490 DDI-DrugBank.d421 DDI-MedLine.d65 DDI-DrugBank.d342 DDI-DrugBank.d264 DDI-MedLine.d10 DDI-DrugBank.d312 DDI-MedLine.d117 DDI-MedLine.d135 DDI-DrugBank.d255 DDI-DrugBank.d390 DDI-DrugBank.d68 DDI-MedLine.d11 DDI-MedLine.d14 DDI-MedLine.d75 DDI-DrugBank.d541 DDI-DrugBank.d118 DDI-MedLine.d50 DDI-DrugBank.d218 DDI-DrugBank.d370 DDI-DrugBank.d201 DDI-DrugBank.d244 DDI-MedLine.d138 DDI-MedLine.d33 DDI-DrugBank.d553 DDI-DrugBank.d125 DDI-DrugBank.d366 DDI-DrugBank.d147 DDI-MedLine.d71 DDI-DrugBank.d363 DDI-MedLine.d32 DDI-MedLine.d76 DDI-DrugBank.d290 DDI-MedLine.d38 DDI-MedLine.d77 DDI-DrugBank.d80 DDI-DrugBank.d27 DDI-MedLine.d120 DDI-DrugBank.d52 DDI-DrugBank.d302 DDI-DrugBank.d486 DDI-DrugBank.d472 DDI-MedLine.d6 DDI-MedLine.d123 DDI-DrugBank.d173 DDI-DrugBank.d570 DDI-DrugBank.d126 DDI-DrugBank.d156 DDI-MedLine.d13 DDI-MedLine.d91 DDI-DrugBank.d349 DDI-DrugBank.d436 DDI-DrugBank.d300 DDI-DrugBank.d432 DDI-MedLine.d52 DDI-DrugBank.d554 DDI-MedLine.d19 DDI-DrugBank.d109 DDI-DrugBank.d63 DDI-DrugBank.d168 DDI-DrugBank.d37 DDI-DrugBank.d50 DDI-DrugBank.d455 DDI-DrugBank.d70 DDI-MedLine.d48 DDI-DrugBank.d515 DDI-DrugBank.d406 DDI-MedLine.d127 DDI-MedLine.d22 DDI-DrugBank.d418 DDI-MedLine.d78 DDI-MedLine.d80 DDI-MedLine.d129 DDI-DrugBank.d61 DDI-DrugBank.d524 DDI-DrugBank.d189 DDI-MedLine.d92 DDI-DrugBank.d6 DDI-DrugBank.d278 DDI-MedLine.d66 DDI-DrugBank.d383 DDI-MedLine.d15 DDI-MedLine.d60 DDI-MedLine.d31 DDI-MedLine.d58 DDI-MedLine.d137 DDI-DrugBank.d555 DDI-DrugBank.d58 DDI-DrugBank.d433 DDI-DrugBank.d375 DDI-DrugBank.d102 DDI-DrugBank.d268 DDI-DrugBank.d391 DDI-MedLine.d83 DDI-DrugBank.d243 DDI-DrugBank.d119 DDI-DrugBank.d49 DDI-MedLine.d139 DDI-DrugBank.d513 DDI-DrugBank.d451 DDI-DrugBank.d38 DDI-DrugBank.d182 DDI-MedLine.d118 DDI-DrugBank.d319 DDI-MedLine.d141 DDI-MedLine.d70 DDI-MedLine.d109 DDI-MedLine.d98 DDI-DrugBank.d214 DDI-DrugBank.d193 DDI-DrugBank.d152 DDI-MedLine.d40 DDI-DrugBank.d535 DDI-DrugBank.d167 DDI-MedLine.d108 DDI-DrugBank.d445 DDI-DrugBank.d235 DDI-DrugBank.d317 DDI-DrugBank.d251 DDI-DrugBank.d496 DDI-DrugBank.d117 DDI-DrugBank.d203 DDI-DrugBank.d532 DDI-DrugBank.d361 DDI-DrugBank.d294 DDI-MedLine.d37 DDI-MedLine.d72 DDI-MedLine.d95 DDI-DrugBank.d280 DDI-MedLine.d26 DDI-MedLine.d74 DDI-DrugBank.d407 DDI-DrugBank.d343 DDI-DrugBank.d209 DDI-DrugBank.d159 DDI-DrugBank.d239 DDI-DrugBank.d155 DDI-DrugBank.d474 DDI-DrugBank.d271 DDI-DrugBank.d403 DDI-DrugBank.d447 DDI-MedLine.d136 DDI-DrugBank.d90 DDI-DrugBank.d136 DDI-MedLine.d41 DDI-DrugBank.d292 DDI-DrugBank.d1 DDI-DrugBank.d92 DDI-DrugBank.d127 
664 samples in ../../data/DDI/raw/train.json has been processed with 195 samples has no triples extracted.
Following PMID in ../../data/DDI/raw/valid.json has no extracted triples:
DDI-DrugBank.d348 DDI-DrugBank.d520 DDI-DrugBank.d248 DDI-MedLine.d122 DDI-MedLine.d103 DDI-MedLine.d35 DDI-MedLine.d24 DDI-DrugBank.d169 DDI-DrugBank.d221 
50 samples in ../../data/DDI/raw/valid.json has been processed with 9 samples has no triples extracted.
191 samples in ../../data/DDI/raw/test.json has been processed with 0 samples has no triples extracted.
Preprocessing train
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
preprocess.sh: line 27: /fast: No such file or directory
preprocess.sh: line 28: /fast: No such file or directory
Preprocessing valid
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
preprocess.sh: line 27: /fast: No such file or directory
preprocess.sh: line 28: /fast: No such file or directory
Preprocessing test
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
preprocess.sh: line 27: /fast: No such file or directory
preprocess.sh: line 28: /fast: No such file or directory
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:497: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
2023-02-14 04:07:49 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-14 04:07:49 | INFO | fairseq_cli.preprocess | Namespace(aim_repo=None, aim_run_hash=None, align_suffix=None, alignfile=None, all_gather_list_size=16384, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='../../data/DDI/relis-bin', dict_only=False, empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=True, log_file=None, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, on_cpu_convert_precision=False, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, source_lang='x', srcdict='../../data/DDI/raw/dict.txt', suppress_crashes=False, target_lang='y', task='translation', tensorboard_logdir=None, testpref='../../data/DDI/raw/relis_test.tok.bpe', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='../../data/DDI/raw/relis_train.tok.bpe', use_plasma_view=False, user_dir=None, validpref='../../data/DDI/raw/relis_valid.tok.bpe', wandb_project=None, workers=8)
2023-02-14 04:07:50 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-preprocess", line 8, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 389, in cli_main
    main(args)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 372, in main
    _make_all(args.source_lang, src_dict, args)
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 185, in _make_all
    _make_dataset(
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 178, in _make_dataset
    _make_binary_dataset(
  File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 119, in _make_binary_dataset
    final_summary = FileBinarizer.multiprocess_dataset(
  File "/usr/local/lib/python3.8/dist-packages/fairseq/binarizer.py", line 100, in multiprocess_dataset
    offsets = find_offsets(input_file, num_workers)
  File "/usr/local/lib/python3.8/dist-packages/fairseq/file_chunker_utils.py", line 25, in find_offsets
    with open(filename, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../../data/DDI/raw/relis_train.tok.bpe.x'

I feel like this must be a very small error that blocks the execution. It seems however that this is difficult to solve by me.

runfish5 avatar Feb 14 '23 04:02 runfish5

@raven44099 I was getting the same error. Please make sure all the environment variables are set properly. Verify the variables once you declare them

karkeranikitha avatar Feb 14 '23 04:02 karkeranikitha

Yes. Do an echo on the shell prompt to check if the variables are set:

echo $MOSES
echo $FASTBPE

You could also put the paths in .bashrc, like this:

export MOSES="/home/ubuntu/BioGPT/mosesdecoder"
export FASTBPE="/home/ubuntu/BioGPT/fastBPE"

ShilpaSangappa avatar Feb 17 '23 05:02 ShilpaSangappa

Yes, the variables were not set!

Thank you very much! I solved it by adding:

%env MOSES=/content/BioGPT/mosesdecoder
%env FASTBPE=/content/BioGPT/fastBPE

Now the output looks different, I hope its correct. I'm not sure because I couldn't run BioGPT on RE-DDI task yet.

New output

hard_match_evaluation.py  postprocess.py  README.md	   train.sh
infer.sh		  preprocess.sh   rebuild_data.py
Following PMID in ../../data/DDI/raw/train.json has no extracted triples:
DDI-DrugBank.d519 DDI-MedLine.d18 DDI-DrugBank.d491 DDI-MedLine.d4 DDI-DrugBank.d134 DDI-DrugBank.d230 DDI-DrugBank.d259 DDI-DrugBank.d293 DDI-MedLine.d64 DDI-MedLine.d100 DDI-DrugBank.d295 DDI-DrugBank.d402 DDI-MedLine.d101 DDI-DrugBank.d190 DDI-MedLine.d140 DDI-MedLine.d112 DDI-MedLine.d9 DDI-DrugBank.d301 DDI-DrugBank.d128 DDI-DrugBank.d101 DDI-DrugBank.d28 DDI-DrugBank.d376 DDI-MedLine.d28 DDI-DrugBank.d93 DDI-MedLine.d88 DDI-DrugBank.d539 DDI-DrugBank.d525 DDI-DrugBank.d540 DDI-DrugBank.d461 DDI-MedLine.d132 DDI-DrugBank.d360 DDI-MedLine.d43 DDI-MedLine.d121 DDI-DrugBank.d262 DDI-DrugBank.d164 DDI-DrugBank.d534 DDI-DrugBank.d385 DDI-DrugBank.d408 DDI-MedLine.d96 DDI-DrugBank.d285 DDI-DrugBank.d473 DDI-MedLine.d57 DDI-DrugBank.d557 DDI-DrugBank.d161 DDI-DrugBank.d24 DDI-DrugBank.d67 DDI-DrugBank.d490 DDI-DrugBank.d421 DDI-MedLine.d65 DDI-DrugBank.d342 DDI-DrugBank.d264 DDI-MedLine.d10 DDI-DrugBank.d312 DDI-MedLine.d117 DDI-MedLine.d135 DDI-DrugBank.d255 DDI-DrugBank.d390 DDI-DrugBank.d68 DDI-MedLine.d11 DDI-MedLine.d14 DDI-MedLine.d75 DDI-DrugBank.d541 DDI-DrugBank.d118 DDI-MedLine.d50 DDI-DrugBank.d218 DDI-DrugBank.d370 DDI-DrugBank.d201 DDI-DrugBank.d244 DDI-MedLine.d138 DDI-MedLine.d33 DDI-DrugBank.d553 DDI-DrugBank.d125 DDI-DrugBank.d366 DDI-DrugBank.d147 DDI-MedLine.d71 DDI-DrugBank.d363 DDI-MedLine.d32 DDI-MedLine.d76 DDI-DrugBank.d290 DDI-MedLine.d38 DDI-MedLine.d77 DDI-DrugBank.d80 DDI-DrugBank.d27 DDI-MedLine.d120 DDI-DrugBank.d52 DDI-DrugBank.d302 DDI-DrugBank.d486 DDI-DrugBank.d472 DDI-MedLine.d6 DDI-MedLine.d123 DDI-DrugBank.d173 DDI-DrugBank.d570 DDI-DrugBank.d126 DDI-DrugBank.d156 DDI-MedLine.d13 DDI-MedLine.d91 DDI-DrugBank.d349 DDI-DrugBank.d436 DDI-DrugBank.d300 DDI-DrugBank.d432 DDI-MedLine.d52 DDI-DrugBank.d554 DDI-MedLine.d19 DDI-DrugBank.d109 DDI-DrugBank.d63 DDI-DrugBank.d168 DDI-DrugBank.d37 DDI-DrugBank.d50 DDI-DrugBank.d455 DDI-DrugBank.d70 DDI-MedLine.d48 DDI-DrugBank.d515 DDI-DrugBank.d406 DDI-MedLine.d127 DDI-MedLine.d22 DDI-DrugBank.d418 DDI-MedLine.d78 DDI-MedLine.d80 DDI-MedLine.d129 DDI-DrugBank.d61 DDI-DrugBank.d524 DDI-DrugBank.d189 DDI-MedLine.d92 DDI-DrugBank.d6 DDI-DrugBank.d278 DDI-MedLine.d66 DDI-DrugBank.d383 DDI-MedLine.d15 DDI-MedLine.d60 DDI-MedLine.d31 DDI-MedLine.d58 DDI-MedLine.d137 DDI-DrugBank.d555 DDI-DrugBank.d58 DDI-DrugBank.d433 DDI-DrugBank.d375 DDI-DrugBank.d102 DDI-DrugBank.d268 DDI-DrugBank.d391 DDI-MedLine.d83 DDI-DrugBank.d243 DDI-DrugBank.d119 DDI-DrugBank.d49 DDI-MedLine.d139 DDI-DrugBank.d513 DDI-DrugBank.d451 DDI-DrugBank.d38 DDI-DrugBank.d182 DDI-MedLine.d118 DDI-DrugBank.d319 DDI-MedLine.d141 DDI-MedLine.d70 DDI-MedLine.d109 DDI-MedLine.d98 DDI-DrugBank.d214 DDI-DrugBank.d193 DDI-DrugBank.d152 DDI-MedLine.d40 DDI-DrugBank.d535 DDI-DrugBank.d167 DDI-MedLine.d108 DDI-DrugBank.d445 DDI-DrugBank.d235 DDI-DrugBank.d317 DDI-DrugBank.d251 DDI-DrugBank.d496 DDI-DrugBank.d117 DDI-DrugBank.d203 DDI-DrugBank.d532 DDI-DrugBank.d361 DDI-DrugBank.d294 DDI-MedLine.d37 DDI-MedLine.d72 DDI-MedLine.d95 DDI-DrugBank.d280 DDI-MedLine.d26 DDI-MedLine.d74 DDI-DrugBank.d407 DDI-DrugBank.d343 DDI-DrugBank.d209 DDI-DrugBank.d159 DDI-DrugBank.d239 DDI-DrugBank.d155 DDI-DrugBank.d474 DDI-DrugBank.d271 DDI-DrugBank.d403 DDI-DrugBank.d447 DDI-MedLine.d136 DDI-DrugBank.d90 DDI-DrugBank.d136 DDI-MedLine.d41 DDI-DrugBank.d292 DDI-DrugBank.d1 DDI-DrugBank.d92 DDI-DrugBank.d127 
664 samples in ../../data/DDI/raw/train.json has been processed with 195 samples has no triples extracted.
Following PMID in ../../data/DDI/raw/valid.json has no extracted triples:
DDI-DrugBank.d348 DDI-DrugBank.d520 DDI-DrugBank.d248 DDI-MedLine.d122 DDI-MedLine.d103 DDI-MedLine.d35 DDI-MedLine.d24 DDI-DrugBank.d169 DDI-DrugBank.d221 
50 samples in ../../data/DDI/raw/valid.json has been processed with 9 samples has no triples extracted.
191 samples in ../../data/DDI/raw/test.json has been processed with 0 samples has no triples extracted.
Preprocessing train
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_train.tok.x ...
Read 116252 words (7707 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_train.tok.x ...
Modified 116252 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_train.tok.y ...
Read 34391 words (1364 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_train.tok.y ...
Modified 34391 words from text file.
Preprocessing valid
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_valid.tok.x ...
Read 10902 words (1974 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_valid.tok.x ...
Modified 10902 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_valid.tok.y ...
Read 2976 words (266 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_valid.tok.y ...
Modified 2976 words from text file.
Preprocessing test
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_test.tok.x ...
Read 30412 words (4124 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_test.tok.x ...
Modified 30412 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_test.tok.y ...
Read 9094 words (703 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_test.tok.y ...
Modified 9094 words from text file.
2023-02-17 08:08:05 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-17 08:08:05 | INFO | fairseq_cli.preprocess | Namespace(aim_repo=None, aim_run_hash=None, align_suffix=None, alignfile=None, all_gather_list_size=16384, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='../../data/DDI/relis-bin', dict_only=False, empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=True, log_file=None, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, on_cpu_convert_precision=False, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, source_lang='x', srcdict='../../data/DDI/raw/dict.txt', suppress_crashes=False, target_lang='y', task='translation', tensorboard_logdir=None, testpref='../../data/DDI/raw/relis_test.tok.bpe', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='../../data/DDI/raw/relis_train.tok.bpe', use_plasma_view=False, user_dir=None, validpref='../../data/DDI/raw/relis_valid.tok.bpe', wandb_project=None, workers=8)
2023-02-17 08:08:05 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_train.tok.bpe.x: 469 sents, 139695 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_valid.tok.bpe.x: 41 sents, 12789 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_test.tok.bpe.x: 191 sents, 36514 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_train.tok.bpe.y: 469 sents, 41376 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_valid.tok.bpe.y: 41 sents, 3472 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_test.tok.bpe.y: 191 sents, 11107 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | Wrote preprocessed data to ../../data/DDI/relis-bin

runfish5 avatar Feb 17 '23 08:02 runfish5