Casper comments

Results 293 comments of


                                            Casper

[Backlog] Consolidated tracker for potential feature requests

Hi @kartikayk, I wonder if Mixtral (or Mixture of Experts models) is on the roadmap and if we can expect support for this new type of model?

cannot load autoawq Model text-generation-webui1.7 how i fix this

I see there may be an issue here - adding CUDA 12.1 to the GitHub workflows should be easy. Fingers crossed, all CUDA kernels work on CUDA 12. I just...

cannot load autoawq Model text-generation-webui1.7 how i fix this

I have updated my build to using CUDA 12.1 and Torch 2.1.0 and I am looking to release the wheels for next release within a week.

cannot load autoawq Model text-generation-webui1.7 how i fix this

0.1.6 has been released with CUDA 12.1.1 and Torch 2.1.0 on PyPi. Additionally, wheels with CUDA 11.8.0 and Torch 2.0.1 are available on the GitHub release if need be.

cannot load autoawq Model text-generation-webui1.7 how i fix this

This is an environment problem. I would encourage you to reset your environment and try to install AutoAWQ from the source and to use the latest version of various libraries.

Fail to load any model with autoawq, aft pull/update latest codes, says "undefined symbol"

This is an environment issue. Please install torch 2.2.0 with cuda 11.8 or cuda 12.1.

Custom all reduce kernels

A 5% throughput improvement is quite impressive from optimizing all reduce with custom kernels. Well done!

Fix convert_dataset_hf.py hanging with excessive num_workers

@codestar12 As part of this pull request, do you want me to spread the same implementation to the `convert_finetuning_dataset.py` script? And remove the `build_dataloader` from the `convert_dataset_json.py` file since it...

Fix convert_dataset_hf.py hanging with excessive num_workers

@codestar12 Thanks. I have now implemented what we agreed to. Additionally, I have updated the tests to include the num_workers argument. Note that in the `convert_finetuning_dataset.py`, the implementation is the...

[CI/Build] A perplexity-computing test for the FP8 KV cache system. Originally used in the context of PR #3290

Hi @Alexei-V-Ivanov-AMD, this is a nice script to have at hand. Other packages like `llama.cpp` run perplexity tests in their CI, which I think vLLM maintainers should consider to avoid...