Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

Convert to pytorch model to use transformers from huggingface

Is there a way to support distributed training? We do not have that many 80G cards.

Convert to pytorch model to use transformers from huggingface

@davidearlyoung Thanks for all the details. Do you know whether distributed inference works or not? We have some A100-40G cards and do not like to sacrifice use quantization. We are...

Convert to pytorch model to use transformers from huggingface

@davidearlyoung I really appreciate your informative analysis! Thanks a lot! > I personally do not know if distributed inference works for grok-1 in pytorch. Yes! that's my questions to the...

Is there a way to map to pyarrow.hdfs.connect?

@martindurant Yes. That's exact what I want. Do you have any suggestion for those custom protocols? Should we do it downstream or upstream?

Error: Cannot find module '../build/Debug/pty.node' or '../build/Release/pty.node'

> I had this too and fixed the issue by deleting the npx directory (`~/.npm/_npx/$SOME_ID_HERE`). > > You should see the path in the error for the relevant directory. deleting...

libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

``` ubuntu@192-9-155-93:~/alpaca-lora$ cp /home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so ``` I meet the exact same problem and this works for me

Blocked after typing the wandb options

@danwei1992 Not yet. Did you encounter the same problem?

Blocked after typing the wandb options

@VVNMA not yet. I will give another try tomorrow

Blocked after typing the wandb options

@VVNMA and other users who has exact same issues as me. here's the update. > Note: OOM issue could be a separate issue, let's talk about it in new threads...

Sharing training log of 7B model on A6000 x 4

@SeungyounShin How long does it take? Can you also share the training logs? I am blocked at this step.. ``` root@5d83a2b86756:~/stanford_alpaca# torchrun --nproc_per_node=4 --master_port=3192 train.py --model_name_or_path /root/models/llama_7B --data_path ./alpaca_data.json --bf16...