automatic-KG-creation-with-LLM
automatic-KG-creation-with-LLM copied to clipboard
Bump vllm from 0.3.1 to 0.5.5
Bumps vllm from 0.3.1 to 0.5.5.
Release notes
Sourced from vllm's releases.
v0.5.5
Highlights
Performance Update
- We introduced a new mode that schedule multiple GPU steps in advance, reducing CPU overhead (#7000, #7387, #7452, #7703). Initial result shows 20% improvements in QPS for a single GPU running 8B and 30B models. You can set
--num-scheduler-steps 8as a parameter to the API server (viavllm serve) orAsyncLLMEngine. We are working on expanding the coverage toLLMclass and aiming to turning it on by default- Various enhancements:
Model Support
- Support Jamba 1.5 (#7415, #7601, #6739)
- Support for the first audio model
UltravoxModel(#7615, #7446)- Improvements to vision models:
- Support loading GGUF model (#5191) with tensor parallelism (#7520)
- Progress in encoder decoder models: support for serving encoder/decoder models (#7258), and architecture for cross-attention (#4942)
Hardware Support
- AMD: Add fp8 Linear Layer for rocm (#7210)
- Enhancements to TPU support: load time W8A16 quantization (#7005), optimized rope (#7635), and support multi-host inference (#7457).
- Intel: various refactoring for worker, executor, and model runner (#7686, #7712)
Others
- Optimize prefix caching performance (#7193)
- Speculative decoding
- Entrypoints
- Quantizations
torch.compile: register custom ops for kernels (#7591, #7594, #7536)What's Changed
- [ci][frontend] deduplicate tests by
@youkaichaoin vllm-project/vllm#7101- [Doc] [SpecDecode] Update MLPSpeculator documentation by
@tdoublepin vllm-project/vllm#7100- [Bugfix] Specify device when loading LoRA and embedding tensors by
@jischeinin vllm-project/vllm#7129- [MISC] Use non-blocking transfer in prepare_input by
@comaniacin vllm-project/vllm#7172- [Core] Support loading GGUF model by
@Isotr0pyin vllm-project/vllm#5191- [Build] Add initial conditional testing spec by
@simon-moin vllm-project/vllm#6841- [LoRA] Relax LoRA condition by
@jeejeeleein vllm-project/vllm#7146- [Model] Support SigLIP encoder and alternative decoders for LLaVA models by
@DarkLight1337in vllm-project/vllm#7153- [BugFix] Fix DeepSeek remote code by
@dsikkain vllm-project/vllm#7178- [ BugFix ] Fix ZMQ when
VLLM_PORTis set by@robertgshaw2-neuralmagicin vllm-project/vllm#7205- [Bugfix] add gguf dependency by
@kpapisin vllm-project/vllm#7198
... (truncated)
Commits
09c7792Bump version to v0.5.5 (#7823)f1df5db[Misc] Updatemarlinto use vLLMParameters (#7803)35ee2ad[github][misc] promote asking llm first (#7809)e25fee5[BugFix] Fix server crash on empty prompt (#7746)faeddb5[misc] Add Torch profiler support for CPU-only devices (#7806)fc5ebbd[Hardware][Intel GPU] refactor xpu_model_runner for tp (#7712)c01a6cb[Ray backend] Better error when pg topology is bad. (#7584)b903e1b[Frontend] error suppression cleanup (#7786)a152246[Misc] fix typo in triton import warning (#7794)666ad0a[ci] Cleanup & refactor Dockerfile to pass different Python versions and scca...- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.