ParlAI Question: Running generation with batches

Hello! I'm generating texts with blender_3B like this (all options are default, except "model_parallel=False"):

agent = create_agent(opt, requireModelExists=True)
agent_copies = []
agent_copies.append(agent.clone())
agent_copies.append(agent.clone()) #comment this out for 2nd try

act_0 = Message({'id': 'context_0', 'text': 'hello', 'episode_done': False})
act_1 = Message({'id': 'context_1', 'text': 'hello', 'episode_done': False}) #comment this out for 2nd try

observations = []
observations.append(self.agent_copies[0].observe(act_0)) 
observations.append(self.agent_copies[1].observe(act_1)) #comment this out for 2nd try

response = self.agent.batch_act(observations)

I get the following results for batch_size=2 (both predictions are exactly the same, I just cut the rest off for readibility):

[{'id': 'TransformerGenerator', 'episode_done': False, 'text': "Hi! How are you? I just got back from a long day at work. I'm 
exhausted!", 'beam_texts': [("Hi! How are you? I just got back from a long day at work. I'm exhausted!", -9.483329772949219), 
("Hi! How are you? I just got back from a long day at work. I'm exhausted.", -9.512072563171387), ('Hi! How are you? I just got 
back from walking my dog. I love to walk.', -9.5917387008667), ....

However when I remove the second item in the batch I get:

[{'id': 'TransformerGenerator', 'episode_done': False, 'text': 'Hi! How are you? I just got back from walking my dog. I love to walk.', 
'beam_texts': [('Hi! How are you? I just got back from walking my dog. I love to walk.', -9.591983795166016), ('Hi! How are you? 
I just got back from walking my dog. I love to walk!', -9.753303527832031), ("Hi! How are you? I just got off the phone with my 
mom, she's having some health problems.", -9.938494682312012)

Now the question is of course: The predictions shouldn't they be in all cases the same given that the inputs are the same? Or is this a numerical issue? I couldn't find an example how to run generation with batches so I wasn't sure if I'm doing this actually the correct way.

The ParlAI code is from today. Python 3.7.5 Ubuntu 18.04 LTS

Mar 02 '21 12:03 thies1006

The following script produces the exact same output for me, regardless of batchsize:

#!/usr/bin/env python3

BS = 2

from parlai.core.agents import create_agent_from_model_file

agent = create_agent_from_model_file(
    "zoo:blender/blender_3B/model", {'model_parallel': False}
)
clones = [agent.clone() for _ in range(BS)]
acts = []
for index in range(BS):
    acts.append(clones[index].observe({'text': 'hello', 'episode_done': True}))
responses = agent.batch_act(acts)
for i in range(BS):
    print(responses[i]['text'])

In all instances, my output is always "Hi! How are you? I just got home from work. I work at a grocery store."

A few possible confounders you may be experiencing:

Is this script truly isolated, or are you running like a self-chat script or interactive or something? Is it possible personas are being set differently across the batch indices?
Is the script you've shown above being truly run separately? The "episode_done": False bit means that if you just run this twice in the same script, without resetting or making new clones, that the model will be responding to "hello\nhello" instead of just "hello"

Mar 02 '21 15:03 stephenroller

Great, thank you for the quick reply! Unfortunately, with your script I get the same results as before:

BS=2: Hi! How are you? I just got back from a long day at work. I'm exhausted! Hi! How are you? I just got back from a long day at work. I'm exhausted!

BS=1: Hi! How are you? I just got back from walking my dog. I love to walk.

log probs are also identical (-9.483329772949219 vs. -9.591983795166016 for the 'best' decoded texts, as above), so I think my script just did the same thing. I've no idea what would possibly the problem here, any idea?

My setup: Python 3.7.5 Ubuntu 18.04 LTS Pytorch 1.7.1 Cuda 10.1.243 Cuda driver 455.45.01

Mar 02 '21 16:03 thies1006

Let me add:

Switching off the GPU seems to solve the problem. By setting CUDA_VISIBLE_DEVICES=-1 I get (nearly) the same scores across different batch sizes. Generated is always "Hi! How are you? I just got back from walking my dog. I love to walk."
With GPU I found in general larger variations with varying batch size, especially for BS=50 this seems a bit more dramatic, see the numbers below. Generated texts are varying.

	BS=1	BS=2	BS=50	BS=100
CPU	-9.590900421142578	-9.590903282165527	-9.590904235839844	-9.590904235839844
GPU	-9.591983795166016	-9.483329772949219	-8.578133583068848	-9.588774681091309

Mar 03 '21 09:03 thies1006

We must be using different models. Mine never said it has a dog.

I expect small floating point errors, but BS=50 is wildly off jeez. BS=2 is too. What GPU are you using?

Are you doing anything other than out-of-the box parlai? Can you replicate this on master?

BS=2: Hi! How are you? I just got back from a long day at work. I'm exhausted! Hi! How are you? I just got back from a long day at work. I'm exhausted!

BS=1: Hi! How are you? I just got back from walking my dog. I love to walk.

Those are the outputs you got from the script I pasted above?

Mar 05 '21 02:03 stephenroller

First, to your questions, sorry if it wasn't 100% clear.

yes, the results I got from your script. Copied, pasted and run.
ParlAI was freshly installed from scratch, the model was downloaded just before running (automatic download by ParlAI).
GPU is always Titan RTX.

What I did today:

Installed Cuda 11.1.105 on a different machine
Installed Nvidia driver 460.32.03
Checked out Pytorch v1.8.0, compiled on this machine and installed into fresh environment (Installing Pytorch via pip results in error when running ParlAI RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)` during forward pass).
Installed ParlAI via git clone && python setup.py develop in environment. There was a dependency problem with torchtext, so I had to skip this (removed from requirement.txt):error: torch 1.8.0a0+37c1f4a is installed but torch==1.8.0 is required by {'torchtext'}. Could this potentially be important?
Ran your script

Results:

	BS=1	BS=2	BS=50	BS=100
CPU	-9.590900421142578	-9.590903282165527	-9.590904235839844	-9.590904235839844
GPU	-9.585407257080078	-8.580317497253418	-9.478259086608887	-8.576322555541992

CPU values exactly the same
GPU Another different set of values. Each BS I ran several times.. Values didn't change with constant BS. Values also don't change with the position in the batch.

Now the new thing:

When running with model_parallel=True and 6 GPUs I get again a different set of values and the values also change within the batch. Seems to happen always at the end of the batch. Appending an example log for better readability.

log_modelparallel_true.txt

ps the corresponding text to the best score -8.57 has always been the above mentioned "Hi! How are you? I just got home from work. I work at a grocery store."

Mar 05 '21 16:03 thies1006

Hm, we haven't tried pytorch 1.8 in open source land yet, so I can't vouch for that. We've used in plenty in internal use cases and not had problems, but I can't rule out that pytorch 1.8 has issues separate from yours

What if you turn off model parallelism? What if you use CUDA_VISIBLE_DEVICES to limit yourself to 4 gpus? To 2?

Can you try another power of 2? BS 4, 8, etc? It's interesting that BS 2 and 100 get the same off value. Makes me suspiciou

I get the -9.47 and change with BS 2 (MP on -9.474090576171875, MP off -9.476009368896484).
BS 8 (mp on) gets -9.476009368896484
With BS 50 and MP I witness the problem it in all but 2 :-O
With BS 48 and MP I witness it everywhere
Also with With BS 32 and MP, BS 16 and MP
With BS 16 and no-MP everything looks right as rain.

So definitely something wrong with ModelParallelism... I reverted #3326 and thinks look consistent, so I must have something wrong with that.

Mar 06 '21 05:03 stephenroller

With the reversion and BS-50 I still observe a weird few observations

Mar 06 '21 05:03 stephenroller

I was printing tensors to find where the differences occur and the first one I found is here: https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/transformer/modules.py#L1331 (line: x = self.lin2(x))

I was looking only at the very first occurrence (so first layer of the encoder).

Input text was always 'hello'. model_parallel=False

Tensor before lin2:

BS=1 tensor([[[-2.9385e-05, -5.8115e-05, -4.0833e-02, 9.7595e-02, -5.5027e-04, -1.0669e-01, -1.8280e-02, -1.5366e-04, 2.7344e-01, -5.0598e-02, ...

BS=2 tensor([[[-2.9385e-05, -5.8115e-05, -4.0833e-02, 9.7595e-02, -5.5027e-04, -1.0669e-01, -1.8280e-02, -1.5366e-04, 2.7344e-01, -5.0598e-02, ...

Tensor after lin2:

BS=1 tensor([[[-0.5508, 0.0132, -2.0469, 1.8398, 1.9492, 0.3269, 3.0977, 0.7681, -1.9385, -1.0479, ..., -1.3574, 1.6406, -0.0542, ...

BS=2 tensor([[[-0.5508, 0.0132, -2.0488, 1.8408, 1.9492, 0.3284, 3.0996, 0.7676, -1.9404, -1.0488, ..., -1.3564, 1.6416, -0.0548, ...

Note, the scores have been for BS=1 -9.57.. and for BS=2 -8.58.. (according to the second table.)

On CPU all tensors seem exactly identical, not a single different digit found. However the CPU tensors are also all different to the GPU ones.

So apparently this is just due to normal floating point precision. Under the hood pytorch (or CUBLAS, whatever) seems to handle this linear layer differently for different batch sizes (?)

Mar 09 '21 10:03 thies1006

Mar 09 '21 13:03 stephenroller

If you have time (I don't immediately), can you trace through with model parallel and non-model parallel and see where things diverge?

Mar 09 '21 13:03 stephenroller

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

Apr 09 '21 00:04 github-actions[bot]

Bump to keep this open

Apr 10 '21 20:04 stephenroller

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

May 11 '21 00:05 github-actions[bot]

ParlAI ParlAI copied to clipboard

Question: Running generation with batches

ParlAI
ParlAI copied to clipboard