cs231n can not get the relative error which is less than e-3.

can not get the relative error which is less than e-3.

Open ge1mina023 opened this issue 1 year ago • 14 comments

I am learning the Transformer_Captioning.ipynb in assignment3. After I run the cell of testing MultiHeadAttention, I get some incorrect results:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even copied your MultiHeadAttention code. But, I still get the same result:

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

I even downloaded your assignment3 code, and I still get the same output.

Is there anything else I missed?

Apr 15 '23 15:04 ge1mina023

I am not sure why you get these values, I just tried running a fresh copy of this repository on Google Colab and everything worked fine. It could be a problem with the environment. Can you try cloning a fresh repository and then run Transformer_Captioning.ipynb (with only changes to the first code cell)? Or maybe you can try running in Colab?

Apr 16 '23 17:04 mantasu

I am not sure why you get these values, I just tried running a fresh copy of this repository on Google Colab and everything worked fine. It could be a problem with the environment. Can you try cloning a fresh repository and then run Transformer_Captioning.ipynb (with only changes to the first code cell)? Or maybe you can try running in Colab?

Yeah. That what I did(with only changes to the first code cell) locally. I guess It's environment issue. My configuration: MacBook Pro M1、Python 3.8.12、conda 4.11.0、torch 2.1.0.dev20230415、macOS Monterey Version 12.5

Apr 17 '23 03:04 ge1mina023

Maybe you can try Python 3.10 and PyTorch 2.0? Colab currently uses 3.9.16 and 2.0.0

Apr 17 '23 06:04 mantasu

Maybe you can try Python 3.10 and PyTorch 2.0? Colab currently uses 3.9.16 and 2.0.0

I updated the python and PyTorch to 3.9.16 and 2.0.0 respectively， and pip install -r requirement.txt for me locally.Because it has some errors, so, I changed some version numbers and installed it successfully. When all of this is done, I also got the same results

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

Apr 17 '23 09:04 ge1mina023

I see, I will try running this in my local environment and I'll let you know. A temporary workaround for now would be to run it on Colab

Apr 18 '23 21:04 mantasu

I see, I will try running this in my local environment and I'll let you know. A temporary workaround for now would be to run it on Colab

Because of some policy reasons, Colab is not convenient for me~

Apr 19 '23 14:04 ge1mina023

I tried in my local environment and still everything seems fine. Can you reproduce the following:

Clone a completely fresh repository:

git clone https://github.com/mantasu/cs231n

Install PyTorch as described here

Install further requirements:

pip install h5py numpy matplotlib imageio ipykernel

Change the first code cell in assignment3/Transformer_Captioning.ipynb to
```
%cd cs231n/datasets/
!bash get_datasets.sh
%cd ../../
```
Run the cells

Apr 24 '23 12:04 mantasu

I tried in my local environment and still everything seems fine. Can you reproduce the following:
Clone a completely fresh repository:
git clone https://github.com/mantasu/cs231n
Install PyTorch as described here
Install further requirements:
pip install h5py numpy matplotlib imageio ipykernel
Change the first code cell in assignment3/Transformer_Captioning.ipynb to
%cd cs231n/datasets/
!bash get_datasets.sh
%cd ../../
Run the cells

I have finished all steps of yours(only Jupyter was installed extra), and I have python 3.9.16 and PyTorch 2.0. But, it's still doesn't work. I still get

self_attn_output error:  0.449382070034207
masked_self_attn_output error:  1.0
attn_output error:  1.0

in my first computable cell.

May 03 '23 03:05 ge1mina023

Hmm, do you only get that in Transformer_Captioning.ipynb? Also, have you tried other solutions (e.g., from other repositories)?

May 05 '23 20:05 mantasu

Hmm, do you only get that in Transformer_Captioning.ipynb? Also, have you tried other solutions (e.g., from other repositories)?

Yeah, I have tried other solutions. And only Transformer_Captioning.ipynb has this error in assignment3.

May 07 '23 16:05 ge1mina023

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

Reinstall Python and Conda
Running on another device
Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN
Running on a virtual machine

May 07 '23 20:05 mantasu

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

Reinstall Python and Conda

Running on another device

Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN

Running on a virtual machine

Thanks, I will try.

May 21 '23 11:05 ge1mina023

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

Reinstall Python and Conda

Running on another device

Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN

Running on a virtual machine

Thanks, I will try. Hi, have you solved the issue? i just met the same one and solved. I would say your error is due to calculation itself rather than environment. It is not like an issue of computational precision. My suggestion is that make sure you follow the right order , querykey->mask->softmax->attn_dropout-> attnvalue->projection.

Jul 05 '23 23:07 xiaoyatang

If even other solutions don't work, it's most likely a problem with the environment. I would suggest trying the following:

Reinstall Python and Conda

Running on another device

Running on a server, e.g., Kaggle, Azure Notebook, SageMaker, CoCalc. I would still suggest using Colab, e.g., with VPN

Running on a virtual machine

Thanks, I will try. Hi, have you solved the issue? i just met the same one and solved. I would say your error is due to calculation itself rather than environment. It is not like an issue of computational precision. My suggestion is that make sure you follow the right order , query_key->mask->softmax->attn_dropout-> attn_value->projection.

Had the same problem with my solution, and using this repo's solution also gave me the exact same error. Are you sure it's a calculation error?

Also using M1. I don't think it's a coincidence. Or are you saying the repo's solution is also inaccurate?

Dec 16 '23 12:12 putskan

cs231n cs231n copied to clipboard

can not get the relative error which is less than e-3.

cs231n
cs231n copied to clipboard