xDAN-AI issues

Results 13 issues of


                                            xDAN-AI

While quantized by awq , error KeyError: 'block_sparse_moe.experts.0.w2'`

`Fetching 31 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 31/31 [00:00

[Usage] Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint When I try to pretrain model.

### Describe the issue Issue: Command: ``` Bash pretrain.sh on my fineunted Llama2 model. ``` Log: ``` You should probably TRAIN this model on a down-stream task to be able...

[Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=20879, OpType=ALLREDUCE, NumelIn=406069248, NumelOut=406069248, Timeout(ms)=1800000) ran for 1800115 milliseconds before timing out.

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports. ### Expected Behavior ### Training in Mixtral model...

bug

How long for the quantizing a 70b model? I had ran for 2days

is it toooo long to quantized a model ?

importlib.metadata.PackageNotFoundError: No package metadata was found for unsloth

` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/workspace/LLaMA-Factory/src/train_bash.py", line 14, in main() File "/workspace/LLaMA-Factory/src/train_bash.py", line 5, in main run_exp() File "/workspace/LLaMA-Factory/src/llmtuner/train/tuner.py",...

unsure bug?

importlib.metadata.PackageNotFoundError: No package metadata was found for The 'unsloth' distribution was not found and is required by this application

training env: LLaMaFactory `01/24/2024 01:53:50 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/transformers/utils/versions.py", line 102, in require_version got_ver = importlib.metadata.version(pkg) File...

unsure bug?

xDAN-AI

While quantized by awq , error KeyError: 'block_sparse_moe.experts.0.w2'`

[Usage] Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint When I try to pretrain model.

[Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=20879, OpType=ALLREDUCE, NumelIn=406069248, NumelOut=406069248, Timeout(ms)=1800000) ran for 1800115 milliseconds before timing out.

How long for the quantizing a 70b model? I had ran for 2days

importlib.metadata.PackageNotFoundError: No package metadata was found for unsloth

importlib.metadata.PackageNotFoundError: No package metadata was found for The 'unsloth' distribution was not found and is required by this application

raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")

API service support like vllm or sglang?

Two issues with potential model format and extended training length

Support quantized model like awq with vllm