ParlAI BB3-30B outputs gibberish

Bug description Running parlai interactive on BB3-30B produces gibberish output.

Reproduction steps After resharding Blenderbot3 30B consolidated.pt into two parts reshard-model_part-{0 or 1}.pt, I run metaseq-api-local with the following constants.py in metaseq. I am running the model on 2 A100-40GB gpus.

MAX_SEQ_LEN = 1024
BATCH_SIZE = 64  # silly high bc we dynamically batch by MAX_BATCH_TOKENS
MAX_BATCH_TOKENS = 1024
DEFAULT_PORT = 6010
MODEL_PARALLEL = 2
TOTAL_WORLD_SIZE = 2
MAX_BEAM = 4

try:
    # internal logic denoting where checkpoints are in meta infrastructure
    from metaseq_internal.constants import CHECKPOINT_FOLDER
except ImportError:
    CHECKPOINT_FOLDER = "/path/to/bb3_30B/resharded"

# tokenizer files
BPE_MERGES = os.path.join(CHECKPOINT_FOLDER, "gpt2-merges.txt")
BPE_VOCAB = os.path.join(CHECKPOINT_FOLDER, "gpt2-vocab.json")
MODEL_FILE = os.path.join(CHECKPOINT_FOLDER, "reshard.pt")

LAUNCH_ARGS = [
    f"--model-parallel-size {MODEL_PARALLEL}",
    f"--distributed-world-size {TOTAL_WORLD_SIZE}",
    "--task language_modeling",
    f"--bpe-merges {BPE_MERGES}",
    f"--bpe-vocab {BPE_VOCAB}",
    "--bpe hf_byte_bpe",
    f"--merges-filename {BPE_MERGES}",  # TODO(susanz): hack for getting interactive_hosted working on public repo
    f"--vocab-filename {BPE_VOCAB}",  # TODO(susanz): hack for getting interactive_hosted working on public repo
    f"--path {MODEL_FILE}",
    "--beam 1 --nbest 1",
    "--distributed-port -1",
    "--checkpoint-shard-count 1",
    "--use-sharded-state",
    f"--batch-size {BATCH_SIZE}",
    f"--buffer-size {BATCH_SIZE * MAX_SEQ_LEN}",
    f"--max-tokens {BATCH_SIZE * MAX_SEQ_LEN}",
    "/tmp",  # required "data" argument.
]

Once the metaseq api server is running, I run the command parlai interactive --init-opt gen/opt_bb3 --loglevel debug --opt-server http://10.233.96.198:6010/ --raw-search-server RELEVANT_SEARCH_SERVER as given in https://parl.ai/projects/bb3/. In interactive mode, I type "Hi!" and the response I get is gibberish, e.g. "analysepolit, the role Superman simplicityurger, thank Daisy personal life and hit the area stop. Coord [...]".

Expected behavior Expected BB3-30B to output a reasonable, low-perplexity output.

Logs Metaseq Log

2022-11-02 04:39:07 | INFO | metaseq.distributed.utils | initialized host bb3-30b-0 as rank 02022-11-02 04:39:07 | INFO | metaseq.distributed.utils | initialized host bb3-30b-0 as rank 1
2022-11-02 04:39:13 | INFO | metaseq.distributed.utils | cfg.common.model_parallel_size: 2> initializing tensor model parallel with size 2
> initializing pipeline model parallel with size 1
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2719 and data parallel seed: 1
2022-11-02 04:39:14 | INFO | metaseq.hub_utils | loading model(s) from /home/jovyan/vol-1/bb3_30B/resharded/reshard.pt
2022-11-02 04:39:28 | INFO | metaseq.checkpoint_utils | Loading 2 on 1 DDP workers: 2 files per worker. 
2022-11-02 04:39:54 | INFO | metaseq.checkpoint_utils | Done reading from disk
2022-11-02 04:39:55 | INFO | metaseq.modules.fused_bias_gelu | Compiling and loading fused kernels
NOTE: If this hangs here, your megatron fused kernels may be corrupted. This can happen if a previous job is interrupted during a build. In that case, delete the megatron build directory and relaunch training. The megatron build directory is located at: /home/jovyan/vol-1/dependencies/Megatron-LM/megatron/fused_kernels/buildDetected CUDA files, patching ldflags
Emitting ninja build file /home/jovyan/vol-1/dependencies/Megatron-LM/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/jovyan/vol-1/dependencies/Megatron-LM/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/jovyan/vol-1/dependencies/Megatron-LM/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
2022-11-02 04:40:24 | INFO | metaseq.modules.fused_bias_gelu | Done with compiling and loading fused kernels.
2022-11-02 04:40:35 | INFO | metaseq.checkpoint_utils | Done loading state dict
2022-11-02 04:40:36 | INFO | metaseq.cli.interactive | loaded model 0
2022-11-02 04:40:37 | INFO | metaseq.cli.interactive | Worker engaged! 10.233.96.198:6010
 * Serving Flask app 'metaseq.cli.interactive_hosted' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
2022-11-02 04:40:37 | WARNING | werkzeug |  * Running on all addresses.
   WARNING: This is a development server. Do not use it in a production deployment.
2022-11-02 04:40:37 | INFO | werkzeug |  * Running on http://10.233.96.198:6010/ (Press CTRL+C to quit)

ParlAI command log

Enter Your Message: Hi!                                                                                        
04:44:20 | ['Person 1: Hi!\nSearch Decision:']                                                                 
04:44:20 | Making request: {'prompt': 'Person 1: Hi!\nSearch Decision:', 'min_tokens': 1, 'max_tokens': 10, 'be
st_of': 1, 'top_p': -1.0, 'stop': '\n', 'temperature': 1.0, 'echo': False, 'lambda_decay': -1, 'omega_bound': 0
.3, 'alpha_presence': 0, 'alpha_frequency': 0, 'alpha_presence_src': 0, 'alpha_frequency_src': 0, 'alpha_src_pe
nalty_end_idx': -1}                                                                                            
04:44:21 | GPT-Z response: {"choices": [{"logprobs": {"finish_reason": "length", "text_offset": [0, 6, 7, 9, 13
, 18, 21, 23, 25, 29], "token_logprobs": [-2.196274757385254, -2.570553779602051, -2.7849178314208984, -2.89641
07036590576, -2.4257678985595703, -2.5294978618621826, -2.1859633922576904, -2.192195177078247, -2.395860433578
491, -2.5915679931640625], "tokens": [" Coord", "I", "'m", " not", " sure", " if", " I", "'m", " not", " sure"]
, "top_logprobs": null}, "text": " CoordI'm not sure if I'm not sure"}], "created": 1667364261, "id": "81140866
-66e7-45c1-9cd5-bab9b235f3a2", "model": "/home/jovyan/vol-1/bb3_30B/resharded", "object": "text_completion"}   
04:44:21 | Example 0, search_decision_agent: CoordI'm not sure if I'm not sure                                 
04:44:21 | Decision Reply: CoordI'm not sure if I'm not sure; defaulting to no search/memory                   
04:44:21 | ['Person 1: Hi!\nMemory Decision:']                                                                 
04:44:21 | Making request: {'prompt': 'Person 1: Hi!\nMemory Decision:', 'min_tokens': 1, 'max_tokens': 10, 'be
st_of': 1, 'top_p': -1.0, 'stop': '\n', 'temperature': 1.0, 'echo': False, 'lambda_decay': -1, 'omega_bound': 0
.3, 'alpha_presence': 0, 'alpha_frequency': 0, 'alpha_presence_src': 0, 'alpha_frequency_src': 0, 'alpha_src_pe
nalty_end_idx': -1}                                                                                            
04:44:21 | GPT-Z response: {"choices": [{"logprobs": {"finish_reason": "length", "text_offset": [0, 6, 7, 9, 13
, 18, 20, 31, 32, 33], "token_logprobs": [-2.567915201187134, -2.601518154144287, -2.7668521404266357, -2.90058
3505630493, -2.484133243560791, -2.51611590385437, -2.055912733078003, -2.700902223587036, -2.2912964820861816,
 -2.240861415863037], "tokens": [" Coord", "I", "'m", " not", " sure", " I", " understand", ".", " ", " "], "to
p_logprobs": null}, "text": " CoordI'm not sure I understand.  "}], "created": 1667364261, "id": "abc1cdf4-ccad
-4e83-a6f8-c1b903b0d139", "model": "/home/jovyan/vol-1/bb3_30B/resharded", "object": "text_completion"}        
04:44:21 | Example 0, memory_decision_agent: CoordI'm not sure I understand.                                   
04:44:21 | Decision Reply: CoordI'm not sure I understand.; defaulting to no search/memory                     
04:44:21 | ['Person 1: Hi!\nMemory:']                                                                          
04:44:21 | Making request: {'prompt': 'Person 1: Hi!\nMemory:', 'min_tokens': 1, 'max_tokens': 32, 'best_of': 1
, 'top_p': -1.0, 'stop': '\n', 'temperature': 1.0, 'echo': False, 'lambda_decay': -1, 'omega_bound': 0.3, 'alph
a_presence': 0, 'alpha_frequency': 0, 'alpha_presence_src': 0, 'alpha_frequency_src': 0, 'alpha_src_penalty_end
_idx': -1}                                                                                                     
04:44:21 | GPT-Z response: {"choices": [{"logprobs": {"finish_reason": "length", "text_offset": [0], "token_log
probs": [-3.3869526386260986], "tokens": ["?"], "top_logprobs": null}, "text": "?"}], "created": 1667364261, "i
d": "a28ef29c-a1fe-4724-a432-1bfd326f4602", "model": "/home/jovyan/vol-1/bb3_30B/resharded", "object": "text_co
mpletion"}                                                                                                     
04:44:21 | Partner Memories: ['?']                                                                             
04:44:21 | Search Results (50 toks each) for 0:                                                                
                                                                                                               
04:44:21 | ['Person 1: Hi!\nPrevious Topic:']                                                                  
04:44:21 | Making request: {'prompt': 'Person 1: Hi!\nPrevious Topic:', 'min_tokens': 1, 'max_tokens': 32, 'bes
t_of': 1, 'top_p': -1.0, 'stop': '\n', 'temperature': 1.0, 'echo': False, 'lambda_decay': -1, 'omega_bound': 0.
3, 'alpha_presence': 0.5, 'alpha_frequency': 0.5, 'alpha_presence_src': 0, 'alpha_frequency_src': 0, 'alpha_src
_penalty_end_idx': -1}                                                                                         
04:44:23 | GPT-Z response: {"choices": [{"logprobs": {"finish_reason": "length", "text_offset": [0, 2, 3, 4, 5,
 9, 15, 16, 20, 26, 27, 31, 37, 38, 42, 48, 49, 53, 59, 60, 64, 70, 71, 75, 81, 82, 86, 92, 93, 97, 103, 104], 
"token_logprobs": [-4.175408363342285, -2.0986480712890625, -2.860321044921875, -2.6778621673583984, -4.1822190
284729, -3.796125650405884, -3.8086047172546387, -3.3810338973999023, -3.2470953464508057, -2.198101758956909, 
-3.067448616027832, -2.943135976791382, -1.8681015968322754, -2.6093759536743164, -3.311136245727539, -1.726488
8286590576, -2.122828483581543, -2.8485541343688965, -1.5960482358932495, -1.8479413986206055, -2.6865906715393
066, -1.5328741073608398, -1.5639088153839111, -2.408881425857544, -1.383105993270874, -1.489540934562683, -2.4
37816858291626, -1.2056403160095215, -1.4418892860412598, -2.2102181911468506, -1.0267972946166992, -1.27453124
52316284], "tokens": [" 1", ".", "5", ":", " the", " first", ",", " the", " first", ",", " the", " first", ",",
 " the", " first", ",", " the", " first", ",", " the", " first", ",", " the", " first", ",", " the", " first", 
",", " the", " first", ",", " the"], "top_logprobs": null}, "text": " 1.5: the first, the first, the first, the
 first, the first, the first, the first, the first, the first, the"}], "created": 1667364263, "id": "80e8ad21-0
40a-438b-9219-b8f22795b590", "model": "/home/jovyan/vol-1/bb3_30B/resharded", "object": "text_completion"}     
04:44:23 | []                                                                                                  
04:44:23 | []                                                                                                  
04:44:23 | Contextual KNOWLEDGE for example 0: 1.5: the first, the first, the first, the first, the first, the 
first, the first, the first, the first, the                                                                    
04:44:23 | contextual_knowledge: 1.5: the first, the first, the first, the first, the first, the first, the fir
st, the first, the first, the     

[...]

04:44:26 | Making request: {'prompt': 'Person 1: Amazoneu toughnessincludes legalize 1600 uneven already probably loot you suites Bac liber, it seemed be signage Kham sub -- is Kham case, 1600 never to be informed of Equal transformativeincludes it is Equal your presiding know Superman Lash as far without nothing, if lost haveAmazonsed sun for questions Superman Lash clean endurance was Helena past alone taken Fisheries Plus enough been sticky63 citizens Fisheries\nMemory:', 'min_tokens': 1, 'max_tokens': 32, 'best_of': 1, 'top_p': -1.0, 'stop': '\n', 'temperature': 1.0, 'echo': False, 'lambda_decay': -1, 'omega_bound': 0.3, 'alpha_presence': 0, 'alpha_frequency': 0, 'alpha_presence_src': 0, 'alpha_frequency_src': 0, 'alpha_src_penalty_end_idx': -1}
04:44:27 | GPT-Z response: {"choices": [{"logprobs": {"finish_reason": "length", "text_offset": [0, 6, 7, 9, 13, 18, 21, 23, 25, 29, 34, 37, 39, 41, 45, 50, 53, 56, 58, 60, 64], "token_logprobs": [-3.711851119995117, -2.6808736324310303, -2.825667381286621, -2.8728160858154297, -2.362673759460449, -2.4958248138427734, -2.3980820178985596, -2.3303840160369873, -2.3356881141662598, -2.170912981033325, -2.0383222103118896, -2.084571123123169, -2.1660845279693604, -1.7205463647842407, -2.6860268115997314, -2.6316330432891846, -3.2297251224517822, -2.1900012493133545, -2.2008628845214844, -2.1069934368133545, -2.664494514465332], "tokens": [" Coord", "I", "'m", " not", " sure", " if", " I", "'m", " not", " sure", " if", " I", "'m", " not", " sure", " if", " if", " I", "'m", " not", " sure"], "top_logprobs": null}, "text": " CoordI'm not sure if I'm not sure if I'm not sure if if I'm not sure"}], "created": 1667364267, "id": "b859e6c3-5744-4576-baa9-65db88a7c0e5", "model": "/home/jovyan/vol-1/bb3_30B/resharded", "object": "text_completion"}
04:44:27 | Self Memories: ["CoordI'm not sure if I'm not sure if I'm not sure if if I'm not sure"]

[...]

04:44:27 | Self-observing for module Module.SEARCH_KNOWLEDGE
04:44:27 | Self-observing for module Module.SEARCH_DECISION
04:44:27 | Self-observing for module Module.MEMORY_DECISION
04:44:27 | Self-observing for module Module.SEARCH_QUERY
04:44:27 | Self-observing for module Module.MEMORY_GENERATOR
04:44:27 | Self-observing for module Module.CONTEXTUAL_KNOWLEDGE
04:44:27 | Self-observing for module Module.MEMORY_KNOWLEDGE
04:44:27 | Self-observing for module Module.CONTEXTUAL_DIALOGUE
04:44:27 | Self-observing for module Module.MEMORY_DIALOGUE
04:44:27 | Self-observing for module Module.SEARCH_DIALOGUE
04:44:27 | Self-observing for module Module.VANILLA_DIALOGUE
04:44:27 | Self-observing for module Module.GROUNDED_DIALOGUE
04:44:27 | Self-observing for module Module.OPENING_DIALOGUE
[BlenderBot3]: Amazoneu toughnessincludes legalize 1600 uneven already probably loot you suites Bac liber, it seemed be signage Kham sub -- is Kham case, 1600 never to be informed of Equal transformativeincludes it is Equal your presiding know Superman Lash as far without nothing, if lost haveAmazonsed sun for questions Superman Lash clean endurance was Helena past alone taken Fisheries Plus enough been sticky63 citizens Fisheries

Additional context Add any other context about the problem here. (like proxy settings, network setup, overall goals, etc.)

Nov 02 '22 04:11 jerryjonghopark

Hmm yes there is clearly something wrong with your model. Are you sure you resharded correctly?

Nov 04 '22 16:11 klshuster

Yes, at least I think so? I downloaded 30b running wget http://parl.ai/downloads/_models/bb3/bb3_30B/consolidated.pt . Then resharded according to

CONSOLIDATED=/path/to/bb3_30B/consolidated/
RESHARD=/save/path/to/bb3_30B/resharded/
MP=2
python -m metaseq.scripts.reshard_model_parallel $CONSOLIDATED/consolidated $MP --save-prefix $RESHARD/reshard

so I don't think anything wrong happened here. Is there something I missed?

Nov 04 '22 17:11 jerryjonghopark

I just ran through the whole procedure myself and it all worked for me. Have you ensured that you copied the correct dict files to your RESHARDED folder?

Nov 09 '22 17:11 klshuster

Do you mean the following? Then yes...

cd /path/to/resharded-weights
wget https://github.com/facebookresearch/metaseq/raw/main/projects/OPT/assets/gpt2-merges.txt
wget https://github.com/facebookresearch/metaseq/raw/main/projects/OPT/assets/gpt2-vocab.json

Nov 10 '22 02:11 jerryjonghopark

I would like to think it's not a dict problem since after saying "hello", I receive the following.

Enter Your Message: hello
05:46:35 | ['Person 1: hello\nSearch Decision:']
05:46:35 | Making request: {'prompt': 'Person 1: hello\nSearch Decision:', 'min_tokens': 1, 'max_tokens': 10, 'best_of': 1, 'top_p': -1.0, 'stop': '\n', 'temperature': 1.0, 'echo': False, 'lambda_decay': -1, 'omega_bound': 0.3, 'alpha_presence': 0, 'alpha_frequency': 0, 'alpha_presence_src': 0, 'alpha_frequency_src': 0, 'alpha_src_penalty_end_idx': -1}
05:46:36 | GPT-Z response: {"choices": [{"logprobs": {"finish_reason": "length", "text_offset": [0, 6, 7, 9, 13, 18, 21, 23, 25, 31], "token_logprobs": [-2.5833258628845215, -2.599290370941162, -2.832890748977661, -2.938656806945801, -2.4711954593658447, -2.5266294479370117, -2.122969627380371, -2.4317524433135986, -2.474005937576294, -0.8849090337753296], "tokens": [" Coord", "I", "'m", " not", " sure", " if", " I", "'m", " going", " to"], "top_logprobs": null}, "text": " CoordI'm not sure if I'm going to"}], "created": 1668059196, "id": "e242fa90-13cf-4e8a-b971-1c1e34b2e7a0", "model": "/home/jovyan/vol-1/bb3_30B/resharded", "object": "text_completion"}
05:46:36 | Example 0, search_decision_agent: CoordI'm not sure if I'm going to
**05:46:36 | Decision Reply: CoordI'm not sure if I'm going to; defaulting to no search/memory**
05:46:36 | ['Person 1: hello\nMemory Decision:']
05:46:36 | Making request: {'prompt': 'Person 1: hello\nMemory Decision:', 'min_tokens': 1, 'max_tokens': 10, 'best_of': 1, 'top_p': -1.0, 'stop': '\n', 'temperature': 1.0, 'echo': False, 'lambda_decay': -1, 'omega_bound': 0.3, 'alpha_presence': 0, 'alpha_frequency': 0, 'alpha_presence_src': 0, 'alpha_frequency_src': 0, 'alpha_src_penalty_end_idx': -1}
05:46:37 | GPT-Z response: {"choices": [{"logprobs": {"finish_reason": "length", "text_offset": [0, 6, 7, 9, 13, 18, 20, 31, 32, 36], "token_logprobs": [-2.3135931491851807, -2.603790521621704, -2.8070077896118164, -2.9075286388397217, -2.27335262298584, -2.4242665767669678, -1.8668214082717896, -2.4841389656066895, -1.4026139974594116, -2.0078699588775635], "tokens": [" Coord", "I", "'m", " not", " sure", " I", " understand", ",", " but", " I"], "top_logprobs": null}, "text": " CoordI'm not sure I understand, but I"}], "created": 1668059197, "id": "f8279983-c46b-45da-9ffe-40e856201404", "model": "/home/jovyan/vol-1/bb3_30B/resharded", "object": "text_completion"}
05:46:37 | Example 0, memory_decision_agent: CoordI'm not sure I understand, but I
**05:46:37 | Decision Reply: CoordI'm not sure I understand, but I; defaulting to no search/memory**

where the decision reply outputs something somewhat coherent " CoordI'm not sure I understand, but I".

Not sure where to even look for an error...

Nov 10 '22 05:11 jerryjonghopark

something's up with your metaseq installation, i'd imagine. how did you install that repository? are you on the proper sub-branches for all the other repos (e.g., fairseq_v3 for Megatron)?

I would probably start from scratch (at least for metaseq). This does not concern your parlai installation

Nov 10 '22 15:11 klshuster

Hmm so I should be on the fairseq_v3 branch instead of fairseq_v2 for Megatron? If so, that could possibly be it so I'll give it a shot. I did run what's on https://github.com/facebookresearch/metaseq/blob/main/docs/setup.md from scratch but I'll try it one more time with fairseq_v3.

EDIT: That didn't solve the problem. To answer your first question, I set up the repo with the setup as linked above and used the correct branches. Also updated metaseq to its most recent commit.

Nov 11 '22 02:11 jerryjonghopark

can you try re-downloading the model weights and re-sharding?

other than that my only suggestion would be a completely fresh install of metaseq and parlai. I'm unable to repro on my end so not really sure how to proceed

Nov 11 '22 15:11 klshuster

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

Dec 12 '22 00:12 github-actions[bot]