cs231n icon indicating copy to clipboard operation
cs231n copied to clipboard

can not get the relative error which is less than e-3.

Open ge1mina023 opened this issue 1 year ago • 14 comments

I am learning the Transformer_Captioning.ipynb in assignment3. After I run the cell of testing MultiHeadAttention, I get some incorrect results:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even copied your MultiHeadAttention code. But, I still get the same result:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even downloaded your assignment3 code, and I still get the same output.

Is there anything else I missed?

ge1mina023 avatar Apr 15 '23 15:04 ge1mina023

I am not sure why you get these values, I just tried running a fresh copy of this repository on Google Colab and everything worked fine. It could be a problem with the environment. Can you try cloning a fresh repository and then run Transformer_Captioning.ipynb (with only changes to the first code cell)? Or maybe you can try running in Colab?

mantasu avatar Apr 16 '23 17:04 mantasu

I am not sure why you get these values, I just tried running a fresh copy of this repository on Google Colab and everything worked fine. It could be a problem with the environment. Can you try cloning a fresh repository and then run Transformer_Captioning.ipynb (with only changes to the first code cell)? Or maybe you can try running in Colab?

Yeah. That what I did(with only changes to the first code cell) locally. I guess It's environment issue. My configuration: MacBook Pro M1、Python 3.8.12、conda 4.11.0、torch 2.1.0.dev20230415、macOS Monterey Version 12.5

ge1mina023 avatar Apr 17 '23 03:04 ge1mina023

Maybe you can try Python 3.10 and PyTorch 2.0? Colab currently uses 3.9.16 and 2.0.0

mantasu avatar Apr 17 '23 06:04 mantasu

Maybe you can try Python 3.10 and PyTorch 2.0? Colab currently uses 3.9.16 and 2.0.0

I updated the python and PyTorch to 3.9.16 and 2.0.0 respectively, and pip install -r requirement.txt for me locally.Because it has some errors, so, I changed some version numbers and installed it successfully. When all of this is done, I also got the same results

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

ge1mina023 avatar Apr 17 '23 09:04 ge1mina023

I see, I will try running this in my local environment and I'll let you know. A temporary workaround for now would be to run it on Colab

mantasu avatar Apr 18 '23 21:04 mantasu

I see, I will try running this in my local environment and I'll let you know. A temporary workaround for now would be to run it on Colab

Because of some policy reasons, Colab is not convenient for me~

ge1mina023 avatar Apr 19 '23 14:04 ge1mina023

I tried in my local environment and still everything seems fine. Can you reproduce the following:

  1. Clone a completely fresh repository:
    git clone https://github.com/mantasu/cs231n
    
  2. Install PyTorch as described here
  3. Install further requirements:
    pip install h5py numpy matplotlib imageio ipykernel
    
  4. Change the first code cell in assignment3/Transformer_Captioning.ipynb to
    %cd cs231n/datasets/
    !bash get_datasets.sh
    %cd ../../
    
  5. Run the cells

mantasu avatar Apr 24 '23 12:04 mantasu

I tried in my local environment and still everything seems fine. Can you reproduce the following:

  1. Clone a completely fresh repository:
    git clone https://github.com/mantasu/cs231n
    
  2. Install PyTorch as described here
  3. Install further requirements:
    pip install h5py numpy matplotlib imageio ipykernel
    
  4. Change the first code cell in assignment3/Transformer_Captioning.ipynb to
    %cd cs231n/datasets/
    !bash get_datasets.sh
    %cd ../../
    
  5. Run the cells

I have finished all steps of yours(only Jupyter was installed extra), and I have python 3.9.16 and PyTorch 2.0. But, it's still doesn't work. I still get

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

in my first computable cell.

ge1mina023 avatar May 03 '23 03:05 ge1mina023

Hmm, do you only get that in Transformer_Captioning.ipynb? Also, have you tried other solutions (e.g., from other repositories)?

mantasu avatar May 05 '23 20:05 mantasu

Hmm, do you only get that in Transformer_Captioning.ipynb? Also, have you tried other solutions (e.g., from other repositories)?

Yeah, I have tried other solutions. And only Transformer_Captioning.ipynb has this error in assignment3.

ge1mina023 avatar May 07 '23 16:05 ge1mina023

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

  • Reinstall Python and Conda
  • Running on another device
  • Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
  • Running on a virtual machine

mantasu avatar May 07 '23 20:05 mantasu

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

  • Reinstall Python and Conda
  • Running on another device
  • Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
  • Running on a virtual machine

Thanks, I will try.

ge1mina023 avatar May 21 '23 11:05 ge1mina023

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

  • Reinstall Python and Conda
  • Running on another device
  • Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
  • Running on a virtual machine

Thanks, I will try. Hi, have you solved the issue? i just met the same one and solved. I would say your error is due to calculation itself rather than environment. It is not like an issue of computational precision. My suggestion is that make sure you follow the right order , querykey->mask->softmax->attn_dropout-> attnvalue->projection.

xiaoyatang avatar Jul 05 '23 23:07 xiaoyatang

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

  • Reinstall Python and Conda
  • Running on another device
  • Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
  • Running on a virtual machine

Thanks, I will try. Hi, have you solved the issue? i just met the same one and solved. I would say your error is due to calculation itself rather than environment. It is not like an issue of computational precision. My suggestion is that make sure you follow the right order , query_key->mask->softmax->attn_dropout-> attn_value->projection.

Had the same problem with my solution, and using this repo's solution also gave me the exact same error. Are you sure it's a calculation error?

Also using M1. I don't think it's a coincidence. Or are you saying the repo's solution is also inaccurate?

putskan avatar Dec 16 '23 12:12 putskan