syskn comments

Results 8 comments of


                                            syskn

Cannot load the checkpoint

_RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory_ When this message shows up, it usually implies that one of the checkpoint files is incomplete (e.g. broken during transfer)....

vLLM stops all processing when CPU KV cache is used, has to be shut down and restarted.

I too can confirm that this issue persists with the default settings of 4GB swap space, in the first release version and the most recent versions.

vLLM stops all processing when CPU KV cache is used, has to be shut down and restarted.

Might be related: https://github.com/vllm-project/vllm/issues/667

[BUG] Failed to checkpoint with deepspeed 0.12.4

I'm having the same issue - checkpointing simply hangs with multi GPUs exactly after >100 steps with ZeRO 1. Tried with various batch sizes and allgather bucket sizes.

3X slow inference on GeForce RTX 3060 after 4bit-128g quantization

I noted this with a NeoX model I quantized (Pythia 128g, desc_act = False, CUDA). Inferencing at least 4x slower than usual on A100-80GB with both CPU (single core) and...

3X slow inference on GeForce RTX 3060 after 4bit-128g quantization

@TheBloke Interesting, so severe performance dip **should not** happen unless desc_act is True. It's strange because I had it explicitly set to False and experienced severe slowdown. H100 means you...

3X slow inference on GeForce RTX 3060 after 4bit-128g quantization

@Ph0rk0z I had to quantize it with a very large number of examples (3072) to see final avg loss on attn_out/attention.dense below 40 and qkv loss below 100. cache_examples_on_gpu must...

Long context will cause the vLLM stop

Probably this: https://github.com/vllm-project/vllm/issues/546 For the record, I wasn't able to fix this particular issue.