Brian Qu comments

Results 44 comments of


                                            Brian Qu

Why did you use sigmoid in classification head?

In the paper, authors have said that they use sigmoid function to do the classification. They have tried softmax, but sigmoid is better probably.

score threshold

Actually, you can read the visualize_single_image.py. They will choose the result whose score is larger than 0.5.

Some puzzles about dataset processing

Actually, you can put all the scans including both train-val and test to the data-dir because the code will distinguish them automatically. If you process test scans, the code will...

[WIP] add OpenAI's CLIP weights

Hello! Thanks for your great work! I also focus on the CLIP convert. In my implement, the activation of OpenAI's CLIP is a little different, using `QuickGELU` instead of `GELU`....

token_id is not matching with the config(InstructBLIP-Vicuna-7b-v1.1)

It's been a long time. I remember that I've modify the token_id setting in `generate` function of InstructBLIP. Actually, this doesn't have much impact at all.

[BUG] Zero3: Gather the params for inference(huggingface_language_model.generate) in the end of 1 epoch and re-partition it for next epoch training

Hi, I've tried this before. But the program is stuck. How can I debug this? And I want to know whether it is because I use 30B+ LLM and zero3...

[BUG] Zero3: Gather the params for inference(huggingface_language_model.generate) in the end of 1 epoch and re-partition it for next epoch training

Sorry, it is inconvenient to share the whole code. I would try my best to provide more information. It is a dense model. I've tried the script on my ~9B...

[BUG] Zero3: Gather the params for inference(huggingface_language_model.generate) in the end of 1 epoch and re-partition it for next epoch training

after double check, I find another error message on one worker. as following（time-out error probably）: ``` [E ProcessGroupNCCL.cpp:475] [Rank 15] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=96383, OpType=_ALLGATHER_BASE, NumelIn=88200, NumelOut=5644800, Timeout(ms)=7200000)...

[BUG] Zero3: Gather the params for inference(huggingface_language_model.generate) in the end of 1 epoch and re-partition it for next epoch training

hi, I also test this in one node(8 x A100) with one 9B model. Stuck appeared. TAT

[BUG] Zero3: Gather the params for inference(huggingface_language_model.generate) in the end of 1 epoch and re-partition it for next epoch training

Oh, thanks, I get it. Do you have any suggestion about this? I think I've done left-padding. How to ensure the output length?