bingjie3216 comments

Results 10 comments of


                                            bingjie3216

[BUG]: LlamaRM model has no attribute 'resize_token_embeddings'

Met the same issue: smart_tokenizer_and_embedding_resize(smart_tokenizer_and_embedding_resize( File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/x-gpu1/code/Users/x/code/ColossalAI/applications/Chat/coati/utils/tokenizer_utils.py", line 68, in smart_tokenizer_and_embedding_resize File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/x-gpu1/code/Users/x/code/ColossalAI/applications/Chat/coati/utils/tokenizer_utils.py", line 68, in smart_tokenizer_and_embedding_resize model.resize_token_embeddings(len(tokenizer))model.resize_token_embeddings(len(tokenizer)) File "/anaconda/envs/coati/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__ File "/anaconda/envs/coati/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__...

Alpaca problem solving team - QQ chat group

I also created a wechat group since my qq was lost for a while: ![image](https://user-images.githubusercontent.com/1332679/227734622-2a007a13-ee4d-4b9a-93dc-7fa97a992998.png)

ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0

Change bf16 to fp16 for non Ampere GPUs

OOM issue when finetune with V100

Thanks a lot for sharing the knowledge, I have tried a bunch of stuff, and let's see how it goes: I lowered the batch size to 1 and also modified...

OOM issue when finetune with V100

Some update with V100, I am running it on V100-32G GPUs, 8 of them on one node: Last step shows: {'eval_loss': 1.3447265625, 'eval_runtime': 25.5031, 'eval_samples_per_second': 39.211, 'eval_steps_per_second': 2.47, 'epoch': 1.0}...

OOM issue when finetune with V100

I am not sure whether it is a bug in the code: Deleting older checkpoint [/home/azureuser/cloudfiles/code/Users/jbing/code/dolly/local_output_dir_0325/checkpoint-1400] due to args.save_total_limit My latest checkpoint is checkpoint-1400 and the one that should be...