Cream icon indicating copy to clipboard operation
Cream copied to clipboard

RuntimeError!

Open SomnusQue opened this issue 1 year ago • 12 comments

I run auto_100weight_inherit_100to75.sh, and meet this problem. I think I have been ready everything for this project, but it still have some problems which I can't solve. Please somebody help me!

SomnusQue avatar Jan 21 '24 11:01 SomnusQue

2859391705835847

SomnusQue avatar Jan 21 '24 11:01 SomnusQue

Hi @SomnusQue , thanks for your attention to our work!

Is the code of TinyCLIP latest?

It is a bug which is triggered on PyTorch 2.x. We have fixed the bug by adding this line: https://github.com/microsoft/Cream/blob/main/TinyCLIP/src/open_clip/model.py#L28

checkpoint = functools.partial(checkpoint, use_reentrant=False)

wkcn avatar Jan 21 '24 12:01 wkcn

Hi @SomnusQue , thanks for your attention to our work!

Is the code of TinyCLIP latest?

It is a bug which is triggered on PyTorch 2.x. We have fixed the bug by adding this line: https://github.com/microsoft/Cream/blob/main/TinyCLIP/src/open_clip/model.py#L28

checkpoint = functools.partial(checkpoint, use_reentrant=False)

OMG! The author answer my question! The code which I have really doesn't have these lines! Thx for your patience! But I wondering when is the code update?

SomnusQue avatar Jan 21 '24 12:01 SomnusQue

Hi @SomnusQue , thanks for your attention to our work! Is the code of TinyCLIP latest? It is a bug which is triggered on PyTorch 2.x. We have fixed the bug by adding this line: https://github.com/microsoft/Cream/blob/main/TinyCLIP/src/open_clip/model.py#L28

checkpoint = functools.partial(checkpoint, use_reentrant=False)

OMG! The author answer my question! The code which I have really doesn't have these lines! Thx for your patience! But I wondering when is the code update? Furthermore... Is this LOSS normal? 2859731705841213_ pic_hd

SomnusQue avatar Jan 21 '24 12:01 SomnusQue

@SomnusQue I fixed the bug in Jan. 11, 2024 (https://github.com/microsoft/Cream/pull/218/files#diff-2c756c8b8b99609dee1b59ce4dcfaf773aa9afbc84e093e03e3e0de653fa0124R28).

You can visualize the loss curve in wandb. The loss is normal if it is decreasing : )

wkcn avatar Jan 21 '24 12:01 wkcn

@SomnusQue I fixed the bug in Jan. 11, 2024 (https://github.com/microsoft/Cream/pull/218/files#diff-2c756c8b8b99609dee1b59ce4dcfaf773aa9afbc84e093e03e3e0de653fa0124R28).

You can visualize the loss curve in wandb. The loss is normal if it is decreasing : )

Thanks for your patience! Due to the cluster, I can't use wandb(because it needs network..?), I change this line in .sh file'--report-to wandb' to '--report-to tensorboard'. Does it have anywhere else need to change in the code?

SomnusQue avatar Jan 21 '24 13:01 SomnusQue

@SomnusQue No code change required. It is also available to set the environmental variable WANDB_MODE=offline. The wandb log will be saved as a file. Then run wandb sync <file path> to upload the log.

wkcn avatar Jan 21 '24 15:01 wkcn

@SomnusQue No code change required. It is also available to set the environmental variable WANDB_MODE=offline. The wandb log will be saved as a file. Then run wandb sync <file path> to upload the log.

sry to bother u again... 3971705891663_ pic_hd The result in tensorboard seems like sth went wrong... 3981705892612_ pic_hd This is the final epoch of my training result..

SomnusQue avatar Jan 22 '24 03:01 SomnusQue

3991705910386_ pic 4001705910406_ pic This is our bash file, is there sth wrong...?

SomnusQue avatar Jan 22 '24 08:01 SomnusQue

Sorry that I did not test TensorBoard yet.

The training data in the provided script is synthetic. They should be replaced with the following command:

 --train-data <your yfcc_path or laion_path/> \
 --dataset-type webdataset \

wkcn avatar Jan 22 '24 12:01 wkcn

Sorry that I did not test TensorBoard yet.

The training data in the provided script is synthetic. They should be replaced with the following command:

 --train-data <your yfcc_path or laion_path/> \
 --dataset-type webdataset \

I downloaded laion file, and put it in the path '/.cache/clip/'. Is this the path I need to write?

SomnusQue avatar Jan 22 '24 12:01 SomnusQue

@SomnusQue Please refer to the document https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data

wkcn avatar Jan 23 '24 01:01 wkcn