ParlAI icon indicating copy to clipboard operation
ParlAI copied to clipboard

BB3 training fails when using `projects.bb3.agents.r2c2_bb3_agent:BlenderBot3Agent` as model

Open fraferra opened this issue 3 years ago • 2 comments

Bug description Currently trying to train BB3 using the following command:

python3.8 -m parlai.scripts.train_model -t customchat_persona,msc --multitask-weights 3,1 -vstep 100000 -lstep 500 --batchsize 1 --validation-every-n-secs 1200 --validation-patience 20 --validation-max-exs 50 --validation-metric token_acc   --init-opt gen/r2c2_bb3 --init-model zoo:bb3/bb3_3B/model --model projects.bb3.agents.r2c2_bb3_agent:BlenderBot3Agent --save-after-valid True --num_epochs 10  -vp 10 -vmt ppl -vmm min -vme 100000  --model-file ParlAI/data/models/v3.1.3/model

When I am using projects.bb3.agents.r2c2_bb3_agent:BlenderBot3Agent instead of seeker and gen/r2c2_bb3 instead of arch/r2c2_bb3. However when I run it I get the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/scripts/train_model.py", line 1061, in <module>
    TrainModel.main()
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/core/script.py", line 129, in main
    return cls._run_args(None)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/core/script.py", line 101, in _run_args
    return cls._run_from_parser_and_opt(opt, parser)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
    return script.run()
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/scripts/train_model.py", line 1057, in run
    return self.train_loop.train()
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/scripts/train_model.py", line 1007, in train
    for _train_log in self.train_steps():
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/scripts/train_model.py", line 914, in train_steps
    world.parley()
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/core/worlds.py", line 700, in parley
    self.worlds[self.world_idx].parley()
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/core/worlds.py", line 370, in parley
    acts[1] = agents[1].act()
  File "/home/fferrari/bespoken-training-docker/ParlAI/projects/bb3/agents/r2c2_bb3_agent.py", line 1443, in act
    response = self.batch_act([self.observations])[0]
  File "/home/fferrari/bespoken-training-docker/ParlAI/projects/bb3/agents/r2c2_bb3_agent.py", line 1376, in batch_act
    batch_reply_knowledge = self.batch_act_knowledge(
  File "/home/fferrari/bespoken-training-docker/ParlAI/projects/bb3/agents/r2c2_bb3_agent.py", line 947, in batch_act_knowledge
    batch_reply_skm = batch_agents[Module.SEARCH_KNOWLEDGE].batch_act(skm_obs)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/core/torch_agent.py", line 2244, in batch_act
    output = self.eval_step(batch)
  File "/home/fferrari/bespoken-training-docker/ParlAI/projects/seeker/agents/seeker.py", line 160, in eval_step
    output = TorchGeneratorAgent.eval_step(self, batch)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/core/torch_generator_agent.py", line 888, in eval_step
    loss, model_output = self.compute_loss(batch, return_output=True)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/agents/rag/rag.py", line 916, in compute_loss
    model_output = self.get_model_output(batch)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/agents/rag/rag.py", line 888, in get_model_output
    model_output = self.model(
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/core/torch_generator_agent.py", line 313, in forward
    encoder_states = prev_enc if prev_enc is not None else self.encoder(*xs)
  File "/home/fferrari/bespoken-training-docker/ParlAI/projects/seeker/agents/seeker_modules.py", line 244, in encoder
    output = super().encoder(
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/agents/fid/fid.py", line 149, in encoder
    enc_out, mask, input_turns_cnt, top_docs, top_doc_scores = super().encoder(
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/agents/rag/modules.py", line 183, in encoder
    expanded_input, top_docs, top_doc_scores = self.retrieve_and_concat(
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/agents/rag/modules.py", line 318, in retrieve_and_concat
    top_docs, top_doc_scores = self.retriever.retrieve(query_vec)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/agents/rag/retrievers.py", line 419, in retrieve
    docs, scores = self.retrieve_and_score(query)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/agents/rag/retrievers.py", line 1212, in retrieve_and_score
    search_queries = self.generate_search_query(query)
  File "/home/fferrari/bespoken-training-docker/ParlAI/parlai/agents/rag/retrievers.py", line 1102, in generate_search_query
    obs_list.append(self.query_generator.observe(msg))
AttributeError: 'NoneType' object has no attribute 'observe'

Additional context Running on 2 A100 GPUs

fraferra avatar Oct 18 '22 19:10 fraferra

Hi, the model should be trained on a GoldDocumentAgent where it doesn't do search query generation on the fly.

jxmsML avatar Oct 25 '22 15:10 jxmsML

Specifically, set --model projects.seeker.agents.seeker:ComboFidGoldDocumentAgent. More information about training can be found in the "BB3 3B Model: Training" section of the project page

klshuster avatar Nov 04 '22 14:11 klshuster

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

github-actions[bot] avatar Dec 05 '22 00:12 github-actions[bot]