ParlAI
ParlAI copied to clipboard
Question: Running generation with batches
Hello! I'm generating texts with blender_3B like this (all options are default, except "model_parallel=False"):
agent = create_agent(opt, requireModelExists=True)
agent_copies = []
agent_copies.append(agent.clone())
agent_copies.append(agent.clone()) #comment this out for 2nd try
act_0 = Message({'id': 'context_0', 'text': 'hello', 'episode_done': False})
act_1 = Message({'id': 'context_1', 'text': 'hello', 'episode_done': False}) #comment this out for 2nd try
observations = []
observations.append(self.agent_copies[0].observe(act_0))
observations.append(self.agent_copies[1].observe(act_1)) #comment this out for 2nd try
response = self.agent.batch_act(observations)
I get the following results for batch_size=2 (both predictions are exactly the same, I just cut the rest off for readibility):
[{'id': 'TransformerGenerator', 'episode_done': False, 'text': "Hi! How are you? I just got back from a long day at work. I'm
exhausted!", 'beam_texts': [("Hi! How are you? I just got back from a long day at work. I'm exhausted!", -9.483329772949219),
("Hi! How are you? I just got back from a long day at work. I'm exhausted.", -9.512072563171387), ('Hi! How are you? I just got
back from walking my dog. I love to walk.', -9.5917387008667), ....
However when I remove the second item in the batch I get:
[{'id': 'TransformerGenerator', 'episode_done': False, 'text': 'Hi! How are you? I just got back from walking my dog. I love to walk.',
'beam_texts': [('Hi! How are you? I just got back from walking my dog. I love to walk.', -9.591983795166016), ('Hi! How are you?
I just got back from walking my dog. I love to walk!', -9.753303527832031), ("Hi! How are you? I just got off the phone with my
mom, she's having some health problems.", -9.938494682312012)
Now the question is of course: The predictions shouldn't they be in all cases the same given that the inputs are the same? Or is this a numerical issue? I couldn't find an example how to run generation with batches so I wasn't sure if I'm doing this actually the correct way.
The ParlAI code is from today. Python 3.7.5 Ubuntu 18.04 LTS
The following script produces the exact same output for me, regardless of batchsize:
#!/usr/bin/env python3
BS = 2
from parlai.core.agents import create_agent_from_model_file
agent = create_agent_from_model_file(
"zoo:blender/blender_3B/model", {'model_parallel': False}
)
clones = [agent.clone() for _ in range(BS)]
acts = []
for index in range(BS):
acts.append(clones[index].observe({'text': 'hello', 'episode_done': True}))
responses = agent.batch_act(acts)
for i in range(BS):
print(responses[i]['text'])
In all instances, my output is always "Hi! How are you? I just got home from work. I work at a grocery store."
A few possible confounders you may be experiencing:
- Is this script truly isolated, or are you running like a self-chat script or interactive or something? Is it possible personas are being set differently across the batch indices?
- Is the script you've shown above being truly run separately? The
"episode_done": False
bit means that if you just run this twice in the same script, without resetting or making new clones, that the model will be responding to "hello\nhello" instead of just "hello"
Great, thank you for the quick reply! Unfortunately, with your script I get the same results as before:
BS=2: Hi! How are you? I just got back from a long day at work. I'm exhausted! Hi! How are you? I just got back from a long day at work. I'm exhausted!
BS=1: Hi! How are you? I just got back from walking my dog. I love to walk.
log probs are also identical (-9.483329772949219 vs. -9.591983795166016 for the 'best' decoded texts, as above), so I think my script just did the same thing. I've no idea what would possibly the problem here, any idea?
My setup: Python 3.7.5 Ubuntu 18.04 LTS Pytorch 1.7.1 Cuda 10.1.243 Cuda driver 455.45.01
Let me add:
- Switching off the GPU seems to solve the problem. By setting CUDA_VISIBLE_DEVICES=-1 I get (nearly) the same scores across different batch sizes. Generated is always "Hi! How are you? I just got back from walking my dog. I love to walk."
- With GPU I found in general larger variations with varying batch size, especially for BS=50 this seems a bit more dramatic, see the numbers below. Generated texts are varying.
BS=1 | BS=2 | BS=50 | BS=100 | |
---|---|---|---|---|
CPU | -9.590900421142578 | -9.590903282165527 | -9.590904235839844 | -9.590904235839844 |
GPU | -9.591983795166016 | -9.483329772949219 | -8.578133583068848 | -9.588774681091309 |
We must be using different models. Mine never said it has a dog.
I expect small floating point errors, but BS=50 is wildly off jeez. BS=2 is too. What GPU are you using?
Are you doing anything other than out-of-the box parlai? Can you replicate this on master?
BS=2: Hi! How are you? I just got back from a long day at work. I'm exhausted! Hi! How are you? I just got back from a long day at work. I'm exhausted!
BS=1: Hi! How are you? I just got back from walking my dog. I love to walk.
Those are the outputs you got from the script I pasted above?
First, to your questions, sorry if it wasn't 100% clear.
- yes, the results I got from your script. Copied, pasted and run.
- ParlAI was freshly installed from scratch, the model was downloaded just before running (automatic download by ParlAI).
- GPU is always Titan RTX.
What I did today:
- Installed Cuda 11.1.105 on a different machine
- Installed Nvidia driver 460.32.03
- Checked out Pytorch v1.8.0, compiled on this machine and installed into fresh environment (Installing Pytorch via pip results in error when running ParlAI
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`
during forward pass). - Installed ParlAI via git clone && python setup.py develop in environment. There was a dependency problem with torchtext, so I had to skip this (removed from requirement.txt):
error: torch 1.8.0a0+37c1f4a is installed but torch==1.8.0 is required by {'torchtext'}
. Could this potentially be important? - Ran your script
Results:
BS=1 | BS=2 | BS=50 | BS=100 | |
---|---|---|---|---|
CPU | -9.590900421142578 | -9.590903282165527 | -9.590904235839844 | -9.590904235839844 |
GPU | -9.585407257080078 | -8.580317497253418 | -9.478259086608887 | -8.576322555541992 |
- CPU values exactly the same
- GPU Another different set of values. Each BS I ran several times.. Values didn't change with constant BS. Values also don't change with the position in the batch.
Now the new thing:
- When running with model_parallel=True and 6 GPUs I get again a different set of values and the values also change within the batch. Seems to happen always at the end of the batch. Appending an example log for better readability.
ps the corresponding text to the best score -8.57 has always been the above mentioned "Hi! How are you? I just got home from work. I work at a grocery store."
Hm, we haven't tried pytorch 1.8 in open source land yet, so I can't vouch for that. We've used in plenty in internal use cases and not had problems, but I can't rule out that pytorch 1.8 has issues separate from yours
What if you turn off model parallelism? What if you use CUDA_VISIBLE_DEVICES to limit yourself to 4 gpus? To 2?
Can you try another power of 2? BS 4, 8, etc? It's interesting that BS 2 and 100 get the same off value. Makes me suspiciou
- I get the -9.47 and change with BS 2 (MP on -9.474090576171875, MP off -9.476009368896484).
- BS 8 (mp on) gets -9.476009368896484
- With BS 50 and MP I witness the problem it in all but 2 :-O
- With BS 48 and MP I witness it everywhere
- Also with With BS 32 and MP, BS 16 and MP
- With BS 16 and no-MP everything looks right as rain.
So definitely something wrong with ModelParallelism... I reverted #3326 and thinks look consistent, so I must have something wrong with that.
With the reversion and BS-50 I still observe a weird few observations
I was printing tensors to find where the differences occur and the first one I found is here: https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/transformer/modules.py#L1331 (line: x = self.lin2(x))
I was looking only at the very first occurrence (so first layer of the encoder).
Input text was always 'hello'. model_parallel=False
Tensor before lin2:
BS=1
tensor([[[-2.9385e-05, -5.8115e-05, -4.0833e-02, 9.7595e-02, -5.5027e-04, -1.0669e-01, -1.8280e-02, -1.5366e-04, 2.7344e-01, -5.0598e-02, ...
BS=2
tensor([[[-2.9385e-05, -5.8115e-05, -4.0833e-02, 9.7595e-02, -5.5027e-04, -1.0669e-01, -1.8280e-02, -1.5366e-04, 2.7344e-01, -5.0598e-02, ...
Tensor after lin2:
BS=1
tensor([[[-0.5508, 0.0132, -2.0469, 1.8398, 1.9492, 0.3269, 3.0977, 0.7681, -1.9385, -1.0479, ..., -1.3574, 1.6406, -0.0542, ...
BS=2
tensor([[[-0.5508, 0.0132, -2.0488, 1.8408, 1.9492, 0.3284, 3.0996, 0.7676, -1.9404, -1.0488, ..., -1.3564, 1.6416, -0.0548, ...
Note, the scores have been for BS=1 -9.57.. and for BS=2 -8.58.. (according to the second table.)
On CPU all tensors seem exactly identical, not a single different digit found. However the CPU tensors are also all different to the GPU ones.
So apparently this is just due to normal floating point precision. Under the hood pytorch (or CUBLAS, whatever) seems to handle this linear layer differently for different batch sizes (?)

If you have time (I don't immediately), can you trace through with model parallel and non-model parallel and see where things diverge?
This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.
Bump to keep this open
This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.