CodeGen
CodeGen copied to clipboard
Lang embeddings loading
The command is !python -m codegen_sources.preprocessing.preprocess data/test_dataset/ --langs cpp java python --mode=monolingual --local=True --fastbpe_vocab_path=/content/CodeGen/data/bpe/cpp-java-python/vocab --fastbpe_code_path=/content/CodeGen/data/bpe/cpp-java-python/codes --bpe_mode=fast --train_splits=1 --percent_test_valid=10
When you train Transcoder from your previous checkpoint you got such lines:
INFO - 03/01/23 08:40:48 - 0:00:09 - ============ Model Reloading
INFO - 03/01/23 08:40:48 - 0:00:09 - Reloading encoder from /content/drive/MyDrive/transcoder/transcoder/l2hpmxrljh/checkpoint.pth ...
WARNING - 03/01/23 08:41:13 - 0:00:33 - No match found for lang cpp_sa cpp in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
WARNING - 03/01/23 08:41:13 - 0:00:33 - No match found for lang java_sa java in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
WARNING - 03/01/23 08:41:13 - 0:00:33 - No match found for lang python_sa python in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
INFO - 03/01/23 08:41:13 - 0:00:33 - Reloading decoders from /content/drive/MyDrive/transcoder/transcoder/l2hpmxrljh/checkpoint.pth ...
WARNING - 03/01/23 08:41:28 - 0:00:49 - No match found for lang cpp_sa cpp in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
WARNING - 03/01/23 08:41:28 - 0:00:49 - No match found for lang java_sa java in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
WARNING - 03/01/23 08:41:28 - 0:00:49 - No match found for lang python_sa python in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
I guess it is not a desirable behavior, that the consequence of
https://github.com/facebookresearch/CodeGen/blob/6e93aca63e7bc77287c9965a5080456326651237/codegen_sources/model/src/model/init.py#L414
if lang in lang_mapping:
lang_ = lang_mapping[lang]
else:
lang_ = lang
simple lang_ = lang
lets reuse previous embeddings or smth is wrong?