h2o-llmstudio
h2o-llmstudio copied to clipboard
Bump the pip group across 1 directory with 2 updates
Bumps the pip group with 2 updates in the / directory: torch and transformers.
Updates torch from 2.6.0 to 2.7.1
Release notes
Sourced from torch's releases.
PyTorch 2.7.1 Release, bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
Torch.compile
Fix Excessive cudagraph re-recording for HF LLM models (#152287) Fix torch.compile on some HuggingFace models (#151154) Fix crash due to Exception raised inside torch.autocast (#152503) Improve Error logging in torch.compile (#149831) Mark mutable custom operators as cacheable in torch.compile (#151194) Implement workaround for a graph break with older version einops (#153925) Fix an issue with tensor.view(dtype).copy_(...) (#151598)
Flex Attention
Fix assertion error due to inductor permuting inputs to flex attention (#151959) Fix performance regression on nanogpt speedrun (#152641)
Distributed
Fix extra CUDA context created by barrier (#149144) Fix an issue related to Distributed Fused Adam in Rocm/APEX when using nccl_ub feature (#150010) Add a workaround random hang in non-blocking API mode in NCCL 2.26 (#154055)
MacOS
Fix MacOS compilation error with Clang 17 (#151316) Fix binary kernels produce incorrect results when one of the tensor arguments is from a wrapped scalar on MPS devices (#152997)
Other
Improve PyTorch Wheel size due to introduction of addition of 128 bit vectorization (#148320) (#152396) Fix fmsub function definition (#152075) Fix Floating point exception in torch.mkldnn_max_pool2d (#151848) Fix abnormal inference output with XPU:1 device (#153067) Fix Illegal Instruction Caused by grid_sample on Windows (#152613) Fix ONNX decomposition does not preserve custom CompositeImplicitAutograd ops (#151826) Fix error with dynamic linking of libgomp library (#150084) Fix segfault in profiler with Python 3.13 (#153848)
PyTorch 2.7.0 Release Notes
- Highlights
- Tracked Regressions
- Backwards Incompatible Changes
- Deprecations
- New Features
- Improvements
- Bug fixes
- Performance
- Documentation
- Developers
Highlights
... (truncated)
Commits
e2d141dset thread_work_size to 4 for unrolled kernel (#154541)1214198[c10d] Fix extra CUDA context created by barrier (#152834)790cc2f[c10d] Add more tests to prevent extra context (#154179)62ea99a[CI] Remove the xpu env source for linux binary validate (#154409)941732c[ROCm] Added unit test to test the cuda_pluggable allocator (#154135)769d5da[binary builds] Linux aarch64 CUDA builds. Make sure tag is set correctly (#1...306ba12Fix uint view copy (#151598) (#154121)1ae9953[ROCm] Update CUDAPluggableAllocator.h (#1984) (#153974)4a815edci: Set minimum cmake version for halide build (#154122)4c7314e[Dynamo] Fix einops regression (#154053)- Additional commits viewable in compare view
Updates transformers from 4.50.3 to 4.52.1
Release notes
Sourced from transformers's releases.
Patch release v4.51.3
A mix of bugs were fixed in this patch; very exceptionally, we diverge from semantic versioning to merge GLM-4 in this patch release.
Patch Release 4.51.2
This is another round of bug fixes, but they are a lot more minor and outputs were not really affected!
- Fix Llama4 offset (#37414) by
@Cyrilvallez- Attention Quantization with FBGemm & TP (#37384) by
@MekkCyber- use rms_norm_eps for the L2Norm for Llama4 (#37418) by
@danielhanchen- mark llama4 as not supported with fa2 (#37416) by
@winglianPatch release v4.51.1
Since the release of Llama 4, we have fixed a few issues that we are now releasing in patch v4.51.1
- Fixing flex attention for torch=2.6.0 (#37285)
- more fixes for post-training llama4 (#37329)
- Remove HQQ from caching allocator warmup (#37347)
- fix derived berts _init_weights (#37341)
- Fix init empty weights without accelerate (#37337)
- Fix deepspeed with quantization (#37324)
- fix llama4 training (#37319)
- fix flex attn when optional args aren't passed (#37327)
- Multiple llama4 fixe (#37353)
Thanks all for your patience
v4.51.0: Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3
New Model Additions
Llama 4
Llama 4, developed by Meta, introduces a new auto-regressive Mixture-of-Experts (MoE) architecture.This generation includes two models:
- The highly capable Llama 4 Maverick with 17B active parameters out of ~400B total, with 128 experts.
- The efficient Llama 4 Scout also has 17B active parameters out of ~109B total, using just 16 experts.
Both models leverage early fusion for native multimodality, enabling them to process text and image inputs. Maverick and Scout are both trained on up to 40 trillion tokens on data encompassing 200 languages (with specific fine-tuning support for 12 languages including Arabic, Spanish, German, and Hindi).
For deployment, Llama 4 Scout is designed for accessibility, fitting on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization, while Maverick is available in BF16 and FP8 formats. These models are released under the custom Llama 4 Community License Agreement, available on the model repositories
Getting started with Llama 4 using transformers is straightforward. Make sure you have transformers v4.51.0 or later installed:
pip install -U transformers[hf_xet] </tr></table>
... (truncated)
Commits
9457279Release: v4.52.1eaa3016Revert parallelism temporarily (#38240)b5f4946Protect ParallelInterface113424bRelease: v4.52.0f834d36[gemma3] fix bidirectional attention mask (#38080)2edb0e4[mllama] fix loading and inference (#38223)390f153Add padding-free to bamba (#35861)2a79471Fixing Bitnet after use_rms_norm introduction (#38229)9661896Enable Quantize KV Cache for Mistral Model (#35042)1c2f36bparallelism goes brrr (#37877)- Additional commits viewable in compare view
You can trigger a rebase of this PR by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore <dependency name> major versionwill close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)@dependabot ignore <dependency name> minor versionwill close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)@dependabot ignore <dependency name>will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)@dependabot unignore <dependency name>will remove all of the ignore conditions of the specified dependency@dependabot unignore <dependency name> <ignore condition>will remove the ignore condition of the specified dependency and ignore conditions You can disable automated security fix PRs for this repo from the Security Alerts page.
Note Automatic rebases have been disabled on this pull request as it has been open for over 30 days.