Casper

Results 293 comments of Casper

Hi @kartikayk, I wonder if Mixtral (or Mixture of Experts models) is on the roadmap and if we can expect support for this new type of model?

I see there may be an issue here - adding CUDA 12.1 to the GitHub workflows should be easy. Fingers crossed, all CUDA kernels work on CUDA 12. I just...

I have updated my build to using CUDA 12.1 and Torch 2.1.0 and I am looking to release the wheels for next release within a week.

0.1.6 has been released with CUDA 12.1.1 and Torch 2.1.0 on PyPi. Additionally, wheels with CUDA 11.8.0 and Torch 2.0.1 are available on the GitHub release if need be.

This is an environment problem. I would encourage you to reset your environment and try to install AutoAWQ from the source and to use the latest version of various libraries.

This is an environment issue. Please install torch 2.2.0 with cuda 11.8 or cuda 12.1.

A 5% throughput improvement is quite impressive from optimizing all reduce with custom kernels. Well done!

@codestar12 As part of this pull request, do you want me to spread the same implementation to the `convert_finetuning_dataset.py` script? And remove the `build_dataloader` from the `convert_dataset_json.py` file since it...

@codestar12 Thanks. I have now implemented what we agreed to. Additionally, I have updated the tests to include the num_workers argument. Note that in the `convert_finetuning_dataset.py`, the implementation is the...

Hi @Alexei-V-Ivanov-AMD, this is a nice script to have at hand. Other packages like `llama.cpp` run perplexity tests in their CI, which I think vLLM maintainers should consider to avoid...