BioGPT
BioGPT copied to clipboard
colab: preprocessing (RE-DDI)
Please see the created public github gist , I try to run !bash preprocess.sh
from within %cd /content/BioGPT/examples/RE-DDI
.
I followed the instructions of this respository's README.md, but I the running the !bash preprocess.sh
shows a very strange output:
Traceback (most recent call last):
File "/usr/local/bin/fairseq-preprocess", line 8, in <module>
sys.exit(cli_main())
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 389, in cli_main
main(args)
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 372, in main
_make_all(args.source_lang, src_dict, args)
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 185, in _make_all
_make_dataset(
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 178, in _make_dataset
_make_binary_dataset(
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 119, in _make_binary_dataset
final_summary = FileBinarizer.multiprocess_dataset(
File "/usr/local/lib/python3.8/dist-packages/fairseq/binarizer.py", line 100, in multiprocess_dataset
offsets = find_offsets(input_file, num_workers)
File "/usr/local/lib/python3.8/dist-packages/fairseq/file_chunker_utils.py", line 25, in find_offsets
with open(filename, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../../data/DDI/raw/relis_train.tok.bpe.x'
expand error
Following PMID in ../../data/DDI/raw/train.json has no extracted triples:
DDI-DrugBank.d519 DDI-MedLine.d18 DDI-DrugBank.d491 DDI-MedLine.d4 DDI-DrugBank.d134 DDI-DrugBank.d230 DDI-DrugBank.d259 DDI-DrugBank.d293 DDI-MedLine.d64 DDI-MedLine.d100 DDI-DrugBank.d295 DDI-DrugBank.d402 DDI-MedLine.d101 DDI-DrugBank.d190 DDI-MedLine.d140 DDI-MedLine.d112 DDI-MedLine.d9 DDI-DrugBank.d301 DDI-DrugBank.d128 DDI-DrugBank.d101 DDI-DrugBank.d28 DDI-DrugBank.d376 DDI-MedLine.d28 DDI-DrugBank.d93 DDI-MedLine.d88 DDI-DrugBank.d539 DDI-DrugBank.d525 DDI-DrugBank.d540 DDI-DrugBank.d461 DDI-MedLine.d132 DDI-DrugBank.d360 DDI-MedLine.d43 DDI-MedLine.d121 DDI-DrugBank.d262 DDI-DrugBank.d164 DDI-DrugBank.d534 DDI-DrugBank.d385 DDI-DrugBank.d408 DDI-MedLine.d96 DDI-DrugBank.d285 DDI-DrugBank.d473 DDI-MedLine.d57 DDI-DrugBank.d557 DDI-DrugBank.d161 DDI-DrugBank.d24 DDI-DrugBank.d67 DDI-DrugBank.d490 DDI-DrugBank.d421 DDI-MedLine.d65 DDI-DrugBank.d342 DDI-DrugBank.d264 DDI-MedLine.d10 DDI-DrugBank.d312 DDI-MedLine.d117 DDI-MedLine.d135 DDI-DrugBank.d255 DDI-DrugBank.d390 DDI-DrugBank.d68 DDI-MedLine.d11 DDI-MedLine.d14 DDI-MedLine.d75 DDI-DrugBank.d541 DDI-DrugBank.d118 DDI-MedLine.d50 DDI-DrugBank.d218 DDI-DrugBank.d370 DDI-DrugBank.d201 DDI-DrugBank.d244 DDI-MedLine.d138 DDI-MedLine.d33 DDI-DrugBank.d553 DDI-DrugBank.d125 DDI-DrugBank.d366 DDI-DrugBank.d147 DDI-MedLine.d71 DDI-DrugBank.d363 DDI-MedLine.d32 DDI-MedLine.d76 DDI-DrugBank.d290 DDI-MedLine.d38 DDI-MedLine.d77 DDI-DrugBank.d80 DDI-DrugBank.d27 DDI-MedLine.d120 DDI-DrugBank.d52 DDI-DrugBank.d302 DDI-DrugBank.d486 DDI-DrugBank.d472 DDI-MedLine.d6 DDI-MedLine.d123 DDI-DrugBank.d173 DDI-DrugBank.d570 DDI-DrugBank.d126 DDI-DrugBank.d156 DDI-MedLine.d13 DDI-MedLine.d91 DDI-DrugBank.d349 DDI-DrugBank.d436 DDI-DrugBank.d300 DDI-DrugBank.d432 DDI-MedLine.d52 DDI-DrugBank.d554 DDI-MedLine.d19 DDI-DrugBank.d109 DDI-DrugBank.d63 DDI-DrugBank.d168 DDI-DrugBank.d37 DDI-DrugBank.d50 DDI-DrugBank.d455 DDI-DrugBank.d70 DDI-MedLine.d48 DDI-DrugBank.d515 DDI-DrugBank.d406 DDI-MedLine.d127 DDI-MedLine.d22 DDI-DrugBank.d418 DDI-MedLine.d78 DDI-MedLine.d80 DDI-MedLine.d129 DDI-DrugBank.d61 DDI-DrugBank.d524 DDI-DrugBank.d189 DDI-MedLine.d92 DDI-DrugBank.d6 DDI-DrugBank.d278 DDI-MedLine.d66 DDI-DrugBank.d383 DDI-MedLine.d15 DDI-MedLine.d60 DDI-MedLine.d31 DDI-MedLine.d58 DDI-MedLine.d137 DDI-DrugBank.d555 DDI-DrugBank.d58 DDI-DrugBank.d433 DDI-DrugBank.d375 DDI-DrugBank.d102 DDI-DrugBank.d268 DDI-DrugBank.d391 DDI-MedLine.d83 DDI-DrugBank.d243 DDI-DrugBank.d119 DDI-DrugBank.d49 DDI-MedLine.d139 DDI-DrugBank.d513 DDI-DrugBank.d451 DDI-DrugBank.d38 DDI-DrugBank.d182 DDI-MedLine.d118 DDI-DrugBank.d319 DDI-MedLine.d141 DDI-MedLine.d70 DDI-MedLine.d109 DDI-MedLine.d98 DDI-DrugBank.d214 DDI-DrugBank.d193 DDI-DrugBank.d152 DDI-MedLine.d40 DDI-DrugBank.d535 DDI-DrugBank.d167 DDI-MedLine.d108 DDI-DrugBank.d445 DDI-DrugBank.d235 DDI-DrugBank.d317 DDI-DrugBank.d251 DDI-DrugBank.d496 DDI-DrugBank.d117 DDI-DrugBank.d203 DDI-DrugBank.d532 DDI-DrugBank.d361 DDI-DrugBank.d294 DDI-MedLine.d37 DDI-MedLine.d72 DDI-MedLine.d95 DDI-DrugBank.d280 DDI-MedLine.d26 DDI-MedLine.d74 DDI-DrugBank.d407 DDI-DrugBank.d343 DDI-DrugBank.d209 DDI-DrugBank.d159 DDI-DrugBank.d239 DDI-DrugBank.d155 DDI-DrugBank.d474 DDI-DrugBank.d271 DDI-DrugBank.d403 DDI-DrugBank.d447 DDI-MedLine.d136 DDI-DrugBank.d90 DDI-DrugBank.d136 DDI-MedLine.d41 DDI-DrugBank.d292 DDI-DrugBank.d1 DDI-DrugBank.d92 DDI-DrugBank.d127
664 samples in ../../data/DDI/raw/train.json has been processed with 195 samples has no triples extracted.
Following PMID in ../../data/DDI/raw/valid.json has no extracted triples:
DDI-DrugBank.d348 DDI-DrugBank.d520 DDI-DrugBank.d248 DDI-MedLine.d122 DDI-MedLine.d103 DDI-MedLine.d35 DDI-MedLine.d24 DDI-DrugBank.d169 DDI-DrugBank.d221
50 samples in ../../data/DDI/raw/valid.json has been processed with 9 samples has no triples extracted.
191 samples in ../../data/DDI/raw/test.json has been processed with 0 samples has no triples extracted.
Preprocessing train
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
preprocess.sh: line 27: /fast: No such file or directory
preprocess.sh: line 28: /fast: No such file or directory
Preprocessing valid
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
preprocess.sh: line 27: /fast: No such file or directory
preprocess.sh: line 28: /fast: No such file or directory
Preprocessing test
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
Can't open perl script "/scripts/tokenizer/tokenizer.perl": No such file or directory
preprocess.sh: line 27: /fast: No such file or directory
preprocess.sh: line 28: /fast: No such file or directory
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:497: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
2023-02-14 04:07:49 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-14 04:07:49 | INFO | fairseq_cli.preprocess | Namespace(aim_repo=None, aim_run_hash=None, align_suffix=None, alignfile=None, all_gather_list_size=16384, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='../../data/DDI/relis-bin', dict_only=False, empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=True, log_file=None, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, on_cpu_convert_precision=False, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, source_lang='x', srcdict='../../data/DDI/raw/dict.txt', suppress_crashes=False, target_lang='y', task='translation', tensorboard_logdir=None, testpref='../../data/DDI/raw/relis_test.tok.bpe', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='../../data/DDI/raw/relis_train.tok.bpe', use_plasma_view=False, user_dir=None, validpref='../../data/DDI/raw/relis_valid.tok.bpe', wandb_project=None, workers=8)
2023-02-14 04:07:50 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
Traceback (most recent call last):
File "/usr/local/bin/fairseq-preprocess", line 8, in <module>
sys.exit(cli_main())
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 389, in cli_main
main(args)
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 372, in main
_make_all(args.source_lang, src_dict, args)
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 185, in _make_all
_make_dataset(
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 178, in _make_dataset
_make_binary_dataset(
File "/usr/local/lib/python3.8/dist-packages/fairseq_cli/preprocess.py", line 119, in _make_binary_dataset
final_summary = FileBinarizer.multiprocess_dataset(
File "/usr/local/lib/python3.8/dist-packages/fairseq/binarizer.py", line 100, in multiprocess_dataset
offsets = find_offsets(input_file, num_workers)
File "/usr/local/lib/python3.8/dist-packages/fairseq/file_chunker_utils.py", line 25, in find_offsets
with open(filename, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../../data/DDI/raw/relis_train.tok.bpe.x'
I feel like this must be a very small error that blocks the execution. It seems however that this is difficult to solve by me.
@raven44099 I was getting the same error. Please make sure all the environment variables are set properly. Verify the variables once you declare them
Yes. Do an echo on the shell prompt to check if the variables are set:
echo $MOSES
echo $FASTBPE
You could also put the paths in .bashrc, like this:
export MOSES="/home/ubuntu/BioGPT/mosesdecoder"
export FASTBPE="/home/ubuntu/BioGPT/fastBPE"
Yes, the variables were not set!
Thank you very much! I solved it by adding:
%env MOSES=/content/BioGPT/mosesdecoder
%env FASTBPE=/content/BioGPT/fastBPE
Now the output looks different, I hope its correct. I'm not sure because I couldn't run BioGPT on RE-DDI task yet.
New output
hard_match_evaluation.py postprocess.py README.md train.sh
infer.sh preprocess.sh rebuild_data.py
Following PMID in ../../data/DDI/raw/train.json has no extracted triples:
DDI-DrugBank.d519 DDI-MedLine.d18 DDI-DrugBank.d491 DDI-MedLine.d4 DDI-DrugBank.d134 DDI-DrugBank.d230 DDI-DrugBank.d259 DDI-DrugBank.d293 DDI-MedLine.d64 DDI-MedLine.d100 DDI-DrugBank.d295 DDI-DrugBank.d402 DDI-MedLine.d101 DDI-DrugBank.d190 DDI-MedLine.d140 DDI-MedLine.d112 DDI-MedLine.d9 DDI-DrugBank.d301 DDI-DrugBank.d128 DDI-DrugBank.d101 DDI-DrugBank.d28 DDI-DrugBank.d376 DDI-MedLine.d28 DDI-DrugBank.d93 DDI-MedLine.d88 DDI-DrugBank.d539 DDI-DrugBank.d525 DDI-DrugBank.d540 DDI-DrugBank.d461 DDI-MedLine.d132 DDI-DrugBank.d360 DDI-MedLine.d43 DDI-MedLine.d121 DDI-DrugBank.d262 DDI-DrugBank.d164 DDI-DrugBank.d534 DDI-DrugBank.d385 DDI-DrugBank.d408 DDI-MedLine.d96 DDI-DrugBank.d285 DDI-DrugBank.d473 DDI-MedLine.d57 DDI-DrugBank.d557 DDI-DrugBank.d161 DDI-DrugBank.d24 DDI-DrugBank.d67 DDI-DrugBank.d490 DDI-DrugBank.d421 DDI-MedLine.d65 DDI-DrugBank.d342 DDI-DrugBank.d264 DDI-MedLine.d10 DDI-DrugBank.d312 DDI-MedLine.d117 DDI-MedLine.d135 DDI-DrugBank.d255 DDI-DrugBank.d390 DDI-DrugBank.d68 DDI-MedLine.d11 DDI-MedLine.d14 DDI-MedLine.d75 DDI-DrugBank.d541 DDI-DrugBank.d118 DDI-MedLine.d50 DDI-DrugBank.d218 DDI-DrugBank.d370 DDI-DrugBank.d201 DDI-DrugBank.d244 DDI-MedLine.d138 DDI-MedLine.d33 DDI-DrugBank.d553 DDI-DrugBank.d125 DDI-DrugBank.d366 DDI-DrugBank.d147 DDI-MedLine.d71 DDI-DrugBank.d363 DDI-MedLine.d32 DDI-MedLine.d76 DDI-DrugBank.d290 DDI-MedLine.d38 DDI-MedLine.d77 DDI-DrugBank.d80 DDI-DrugBank.d27 DDI-MedLine.d120 DDI-DrugBank.d52 DDI-DrugBank.d302 DDI-DrugBank.d486 DDI-DrugBank.d472 DDI-MedLine.d6 DDI-MedLine.d123 DDI-DrugBank.d173 DDI-DrugBank.d570 DDI-DrugBank.d126 DDI-DrugBank.d156 DDI-MedLine.d13 DDI-MedLine.d91 DDI-DrugBank.d349 DDI-DrugBank.d436 DDI-DrugBank.d300 DDI-DrugBank.d432 DDI-MedLine.d52 DDI-DrugBank.d554 DDI-MedLine.d19 DDI-DrugBank.d109 DDI-DrugBank.d63 DDI-DrugBank.d168 DDI-DrugBank.d37 DDI-DrugBank.d50 DDI-DrugBank.d455 DDI-DrugBank.d70 DDI-MedLine.d48 DDI-DrugBank.d515 DDI-DrugBank.d406 DDI-MedLine.d127 DDI-MedLine.d22 DDI-DrugBank.d418 DDI-MedLine.d78 DDI-MedLine.d80 DDI-MedLine.d129 DDI-DrugBank.d61 DDI-DrugBank.d524 DDI-DrugBank.d189 DDI-MedLine.d92 DDI-DrugBank.d6 DDI-DrugBank.d278 DDI-MedLine.d66 DDI-DrugBank.d383 DDI-MedLine.d15 DDI-MedLine.d60 DDI-MedLine.d31 DDI-MedLine.d58 DDI-MedLine.d137 DDI-DrugBank.d555 DDI-DrugBank.d58 DDI-DrugBank.d433 DDI-DrugBank.d375 DDI-DrugBank.d102 DDI-DrugBank.d268 DDI-DrugBank.d391 DDI-MedLine.d83 DDI-DrugBank.d243 DDI-DrugBank.d119 DDI-DrugBank.d49 DDI-MedLine.d139 DDI-DrugBank.d513 DDI-DrugBank.d451 DDI-DrugBank.d38 DDI-DrugBank.d182 DDI-MedLine.d118 DDI-DrugBank.d319 DDI-MedLine.d141 DDI-MedLine.d70 DDI-MedLine.d109 DDI-MedLine.d98 DDI-DrugBank.d214 DDI-DrugBank.d193 DDI-DrugBank.d152 DDI-MedLine.d40 DDI-DrugBank.d535 DDI-DrugBank.d167 DDI-MedLine.d108 DDI-DrugBank.d445 DDI-DrugBank.d235 DDI-DrugBank.d317 DDI-DrugBank.d251 DDI-DrugBank.d496 DDI-DrugBank.d117 DDI-DrugBank.d203 DDI-DrugBank.d532 DDI-DrugBank.d361 DDI-DrugBank.d294 DDI-MedLine.d37 DDI-MedLine.d72 DDI-MedLine.d95 DDI-DrugBank.d280 DDI-MedLine.d26 DDI-MedLine.d74 DDI-DrugBank.d407 DDI-DrugBank.d343 DDI-DrugBank.d209 DDI-DrugBank.d159 DDI-DrugBank.d239 DDI-DrugBank.d155 DDI-DrugBank.d474 DDI-DrugBank.d271 DDI-DrugBank.d403 DDI-DrugBank.d447 DDI-MedLine.d136 DDI-DrugBank.d90 DDI-DrugBank.d136 DDI-MedLine.d41 DDI-DrugBank.d292 DDI-DrugBank.d1 DDI-DrugBank.d92 DDI-DrugBank.d127
664 samples in ../../data/DDI/raw/train.json has been processed with 195 samples has no triples extracted.
Following PMID in ../../data/DDI/raw/valid.json has no extracted triples:
DDI-DrugBank.d348 DDI-DrugBank.d520 DDI-DrugBank.d248 DDI-MedLine.d122 DDI-MedLine.d103 DDI-MedLine.d35 DDI-MedLine.d24 DDI-DrugBank.d169 DDI-DrugBank.d221
50 samples in ../../data/DDI/raw/valid.json has been processed with 9 samples has no triples extracted.
191 samples in ../../data/DDI/raw/test.json has been processed with 0 samples has no triples extracted.
Preprocessing train
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_train.tok.x ...
Read 116252 words (7707 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_train.tok.x ...
Modified 116252 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_train.tok.y ...
Read 34391 words (1364 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_train.tok.y ...
Modified 34391 words from text file.
Preprocessing valid
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_valid.tok.x ...
Read 10902 words (1974 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_valid.tok.x ...
Modified 10902 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_valid.tok.y ...
Read 2976 words (266 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_valid.tok.y ...
Modified 2976 words from text file.
Preprocessing test
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_test.tok.x ...
Read 30412 words (4124 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_test.tok.x ...
Modified 30412 words from text file.
Loading codes from ../../data/DDI/raw/bpecodes ...
Read 40000 codes from the codes file.
Loading vocabulary from ../../data/DDI/raw/relis_test.tok.y ...
Read 9094 words (703 unique) from text file.
Applying BPE to ../../data/DDI/raw/relis_test.tok.y ...
Modified 9094 words from text file.
2023-02-17 08:08:05 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-17 08:08:05 | INFO | fairseq_cli.preprocess | Namespace(aim_repo=None, aim_run_hash=None, align_suffix=None, alignfile=None, all_gather_list_size=16384, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='../../data/DDI/relis-bin', dict_only=False, empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=True, log_file=None, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, on_cpu_convert_precision=False, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, source_lang='x', srcdict='../../data/DDI/raw/dict.txt', suppress_crashes=False, target_lang='y', task='translation', tensorboard_logdir=None, testpref='../../data/DDI/raw/relis_test.tok.bpe', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='../../data/DDI/raw/relis_train.tok.bpe', use_plasma_view=False, user_dir=None, validpref='../../data/DDI/raw/relis_valid.tok.bpe', wandb_project=None, workers=8)
2023-02-17 08:08:05 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_train.tok.bpe.x: 469 sents, 139695 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_valid.tok.bpe.x: 41 sents, 12789 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [x] ../../data/DDI/raw/relis_test.tok.bpe.x: 191 sents, 36514 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_train.tok.bpe.y: 469 sents, 41376 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:06 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_valid.tok.bpe.y: 41 sents, 3472 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] Dictionary: 42384 types
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | [y] ../../data/DDI/raw/relis_test.tok.bpe.y: 191 sents, 11107 tokens, 0.0% replaced (by <unk>)
2023-02-17 08:08:07 | INFO | fairseq_cli.preprocess | Wrote preprocessed data to ../../data/DDI/relis-bin