BioGPT icon indicating copy to clipboard operation
BioGPT copied to clipboard

Cannot load the model TransformerLanguageModelPrompt.from_pretrained

Open yuhangjiang22 opened this issue 2 years ago • 6 comments

Hi there,

I cannot load the model RE-DTI-BioGPT by running

import torch
from src.transformer_lm_prompt import TransformerLanguageModelPrompt
m = TransformerLanguageModelPrompt.from_pretrained(
        "checkpoints/RE-BC5CDR-BioGPT", 
        "checkpoint_avg.pt", 
        "data/BC5CDR/relis-bin",
        tokenizer='moses', 
        bpe='fastbpe', 
        bpe_codes="data/bpecodes",
        max_len_b=1024,
        beam=1)

and got the error

2023-02-14 15:08:06 | INFO | fairseq.file_utils | loading archive file checkpoints/RE-BC5CDR-BioGPT
2023-02-14 15:08:06 | INFO | fairseq.file_utils | loading archive file data/BC5CDR/relis-bin
2023-02-14 15:08:10 | INFO | src.language_modeling_prompt | dictionary: 42384 types
2023-02-14 15:08:13 | INFO | fairseq.models.fairseq_model | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'user_dir': '../../src', 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': None, 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_algorithm': 'LocalSGD', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'distributed_num_procs': 1}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': True, 'max_tokens': 1024, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 1024, 'batch_size_valid': None, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'optimization': {'_name': None, 'max_epoch': 100, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [32], 'lr': [1e-05], 'stop_min_lr': -1.0, 'use_bmuf': False}, 'checkpoint': {'_name': None, 'save_dir': '../../checkpoints/RE-BC5CDR-BioGPT', 'restore_file': 'checkpoint_last.pt', 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 1, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 1024, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'transformer_lm_prompt_biogpt', 'activation_fn': 'gelu', 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'relu_dropout': 0.0, 'decoder_embed_dim': 1024, 'decoder_output_dim': 1024, 'decoder_input_dim': 1024, 'decoder_ffn_embed_dim': 4096, 'decoder_layers': 24, 'decoder_attention_heads': 16, 'decoder_normalize_before': True, 'no_decoder_final_norm': False, 'adaptive_softmax_cutoff': None, 'adaptive_softmax_dropout': 0.0, 'adaptive_softmax_factor': 4.0, 'no_token_positional_embeddings': False, 'share_decoder_input_output_embed': True, 'character_embeddings': False, 'character_filters': '[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]', 'character_embedding_dim': 4, 'char_embedder_highway_layers': 2, 'adaptive_input': False, 'adaptive_input_factor': 4.0, 'adaptive_input_cutoff': None, 'tie_adaptive_weights': False, 'tie_adaptive_proj': False, 'decoder_learned_pos': True, 'layernorm_embedding': False, 'no_scale_embedding': False, 'checkpoint_activations': False, 'offload_activations': False, 'decoder_layerdrop': 0.0, 'decoder_layers_to_keep': None, 'quant_noise_pq': 0.0, 'quant_noise_pq_block_size': 8, 'quant_noise_scalar': 0.0, 'min_params_to_wrap': 100000000, 'base_layers': 0, 'base_sublayers': 1, 'base_shuffle': 0, 'add_bos_token': False, 'tokens_per_sample': 1024, 'max_target_positions': 1024, 'tpu': False, 'scale_fc': False, 'scale_attn': False, 'scale_heads': False, 'scale_resids': False}, 'task': {'_name': 'language_modeling_prompt', 'data': 'data/BC5CDR/relis-bin', 'sample_break_mode': 'none', 'tokens_per_sample': 1024, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': False, 'max_target_positions': 1024, 'shorten_method': 'none', 'shorten_data_split_list': '', 'pad_to_fixed_length': False, 'pad_to_fixed_bsz': False, 'seed': 1, 'batch_size': None, 'batch_size_valid': None, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma', 'source_lang': None, 'target_lang': None, 'max_source_positions': 640, 'manual_prompt': None, 'learned_prompt': 9, 'learned_prompt_pattern': 'learned', 'prefix': False, 'sep_token': '<seqsep>'}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': False}, 'optimizer': {'_name': 'adam', 'adam_betas': '(0.9, 0.98)', 'adam_eps': 1e-08, 'weight_decay': 0.01, 'use_old_adam': False, 'tpu': False, 'lr': [1e-05]}, 'lr_scheduler': {'_name': 'inverse_sqrt', 'warmup_updates': 100, 'warmup_init_lr': 1e-07, 'lr': [1e-05]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': {'_name': 'fastbpe', 'bpe_codes': 'data/bpecodes'}, 'tokenizer': {'_name': 'moses', 'source_lang': 'en', 'target_lang': 'en', 'moses_no_dash_splits': False, 'moses_no_escape': False}}
Traceback (most recent call last):
  File "/Users/kaka/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-25-639ebf247e35>", line 1, in <module>
    m = TransformerLanguageModelPrompt.from_pretrained(
  File "/Users/kaka/lib/python3.8/site-packages/fairseq/models/fairseq_model.py", line 275, in from_pretrained
    return hub_utils.GeneratorHubInterface(x["args"], x["task"], x["models"])
  File "/Users/kaka/lib/python3.8/site-packages/fairseq/hub_utils.py", line 108, in __init__
    self.bpe = encoders.build_bpe(cfg.bpe)
  File "/Users/kaka/lib/python3.8/site-packages/fairseq/registry.py", line 61, in build_x
    return builder(cfg, *extra_args, **extra_kwargs)
  File "/Users/kaka/lib/python3.8/site-packages/fairseq/data/encoders/fastbpe.py", line 27, in __init__
    self.bpe = fastBPE.fastBPE(codes)
AttributeError: module 'fastBPE' has no attribute 'fastBPE'

Is there any idea why this is happening?

yuhangjiang22 avatar Feb 14 '23 20:02 yuhangjiang22

you need to install fastBPE with g++ compiler: https://github.com/glample/fastBPE.git probably fastBPE doesn't work on windows.

Husseinfadhel avatar Feb 15 '23 18:02 Husseinfadhel

@Husseinfadhel does fastBPE work with WSL ???

tushar20121907 avatar Feb 23 '23 08:02 tushar20121907

fastBPE works fine under WSL/Ubuntu, but additional dependencies are either missing or need to be generated using other scripts. For example, data/BC5CDR/relis-bin doesn't exist in the repo and I'm not sure how to create it. It would be nice if the authors would provide instructions for building everything that's needed in order to successfully run the examples.

bmazeraski avatar Feb 27 '23 21:02 bmazeraski

I'm running Pycharm on Windows 10 using a conda environment configured under WSL/Ubuntu and this example from the BioGPT github page works fine for me. I haven't been able to run the second example though.

import torch from fairseq.models.transformer_lm import TransformerLanguageModel m = TransformerLanguageModel.from_pretrained( "checkpoints/Pre-trained-BioGPT", "checkpoint.pt", "data", tokenizer='moses', bpe='fastbpe', bpe_codes="data/bpecodes", min_len=100, max_len_b=1024) m.cuda() src_tokens = m.encode("COVID-19 is") generate = m.generate([src_tokens], beam=5)[0] output = m.decode(generate[0]["tokens"]) print(output)

bmazeraski avatar Feb 27 '23 21:02 bmazeraski

It looks like the relis-bin is the raw training data and may need to be regenerated via preprocess.sh which rebuilds the source data files from train/test/validate. I'm stuck on the FASTBPE/fast requirement (linux only I believe)

panamantis avatar Feb 28 '23 00:02 panamantis

Yes... it appears that fastBPE needs to run under Linux. My attempts to build it under Windows were not successful. My workaround was to install the Windows Subsystem for Linux (WSL) on my Windows 10 box and then create a BioGPT virtual environment on WSL/Ubuntu. All of the BioGPT dependencies are installed in that environment. I develop all of my python code on Windows, but run the code under WSL/Ubuntu by simply setting the virtual environment as my active python interpreter.

bmazeraski avatar Feb 28 '23 11:02 bmazeraski