Simon Mo comments

Results 313 comments of


                                            Simon Mo

[Usage]: Flash Attention not working any more

Any error message or repro?

File Limit Request: vllm - 400 MiB

Hi @cmaureir, I would like to inquire the current total usage of vLLM packages and whether we can increase the project limit of 10GB. We have made quite some progress...

RISECamp Tutorial Final Feedback and Todo

Simon - [ ] Simon enable docker network in a fork using docker `campnet_clipper`

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model

I made a pass. I think once this PR adds unit test for both the Triton and PagedAttention kernels it should be good to go. You might also need to...

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model

I have tested the PR locally as well.

[CI/Build] use setuptools-scm to set version

Hi @dtrifiro, here's the problem I ran into when releasing v0.6.2 yesterday: * The commit was https://github.com/vllm-project/vllm/commit/7193774b1ff8603ad5bf4598e5efba0d9a39b436, which is tagged with v0.6.2. * However, the buildkite job that is supposed...

[CI/Build] use setuptools-scm to set version

Hmmm this build still produced a dev version. https://buildkite.com/vllm/release/builds/1373#01928cb1-0918-4cf1-862e-f708c516b203

[CI/Build] use setuptools-scm to set version

and my local build using ` DOCKER_BUILDKIT=1 docker build . --network=host --target vllm-openai --tag vllm/vllm-openai --build-arg max_jobs=32 --build-arg RUN_WHEEL_CHECK=false --build-arg USE_SCCACHE=1 --build-arg SCCACHE_S3_NO_CREDENTIALS=1` produced `v0.6.4.dev....`

[CI/Build] use setuptools-scm to set version

Indeed the wheel is published manually by me.

[Draft][CI/Build] Optimize models tests

I believe we should download the model each time. @robertgshaw2-neuralmagic mentioned that putting them on NFS is a bit tricky because it might reaches rate limit.