LocalAI chore(deps): bump transformers from 4.48.3 to 4.57.2 in /backend/python/coqui

Bumps transformers from 4.48.3 to 4.57.2.

Release notes

Patch Release v4.57.2

This patch most notably fixes an issue on some Mistral tokenizers. It contains the following commits:

Add AutoTokenizer mapping for mistral3 and ministral (#42198)

Auto convert tekken.json (#42299)

fix tekken pattern matching (#42363)

Check model inputs - hidden states (#40994)

Remove invalid @staticmethod from module-level get_device_and_memory_breakdown (#41747)

Patch release v4.57.1

This patch most notably fixes an issue with an optional dependency (optax), which resulted in parsing errors with poetry. It contains the following fixes:

fix optax dep issue

remove offload_state_dict from kwargs

Fix bnb fsdp loading for pre-quantized checkpoint (#41415)

Fix tests fsdp (#41422)

Fix trainer for py3.9 (#41359)

v4.57.0: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL, BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3

New model additions

Qwen3 Next

The Qwen3-Next series represents the Qwen team's next-generation foundation models, optimized for extreme context length and large-scale parameter efficiency. The series introduces a suite of architectural innovations designed to maximize performance while minimizing computational cost:

Hybrid Attention: Replaces standard attention with the combination of Gated DeltaNet and Gated Attention, enabling efficient context modeling.

High-Sparsity MoE: Achieves an extreme low activation ratio as 1:50 in MoE layers — drastically reducing FLOPs per token while preserving model capacity.

Multi-Token Prediction(MTP): Boosts pretraining model performance, and accelerates inference.

Other Optimizations: Includes techniques such as zero-centered and weight-decayed layernorm, Gated Attention, and other stabilizing enhancements for robust training.

Built on this architecture, they trained and open-sourced Qwen3-Next-80B-A3B — 80B total parameters, only 3B active — achieving extreme sparsity and efficiency.

Despite its ultra-efficiency, it outperforms Qwen3-32B on downstream tasks — while requiring less than 1/10 of the training cost. Moreover, it delivers over 10x higher inference throughput than Qwen3-32B when handling contexts longer than 32K tokens.

For more details, please visit their blog Qwen3-Next (blog post).

Adding Support for Qwen3-Next by @bozheng-hit in #40771

Vault Gemma

VaultGemma is a text-only decoder model derived from Gemma 2, notably it drops the norms after the Attention and MLP blocks, and uses full attention for all layers instead of alternating between full attention and local sliding attention. VaultGemma is available as a pretrained model with 1B parameters that uses a 1024 token sequence length.

VaultGemma was trained from scratch with sequence-level differential privacy (DP). Its training data includes the same mixture as the Gemma 2 models, consisting of a number of documents of varying lengths. Additionally, it is trained using DP stochastic gradient descent (DP-SGD) and provides a (ε ≤ 2.0, δ ≤ 1.1e-10)-sequence-level DP guarantee, where a sequence consists of 1024 consecutive tokens extracted from heterogeneous data sources. Specifically, the privacy unit of the guarantee is for the sequences after sampling and packing of the mixture.

add: differential privacy research model by @RyanMullins in #40851

... (truncated)

Commits

2915fb3 Release v4.57.2
2a59904 fix tekken pattern matching (#42363)
7e66db7 Auto convert tekken.json (#42299)
311807f Remove invalid @staticmethod from module-level get_device_and_memory_breakd...
804038f Add AutoTokenizer mapping for mistral3 and ministral (#42198)
ede92a8 Check model inputs - hidden states (#40994)
8cb5963 Release: v4.57.1
c6ae19e Fix trainer for py3.9 (#41359)
e0c6038 Fix tests fsdp (#41422)
2fbd25c Fix bnb fsdp loading for pre-quantized checkpoint (#41415)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Nov 24 '25 18:11 dependabot[bot]

Deploy Preview for localai ready!

Name	Link
Latest commit	3d32197650ad7a2721a018eb40dafd80e474968a
Latest deploy log	https://app.netlify.com/projects/localai/deploys/6924aa19f413cf0008da726b
Deploy Preview	https://deploy-preview-7349--localai.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Nov 24 '25 18:11 netlify[bot]

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

Nov 25 '25 16:11 dependabot[bot]

LocalAI LocalAI copied to clipboard

chore(deps): bump transformers from 4.48.3 to 4.57.2 in /backend/python/coqui

Patch Release v4.57.2

Patch release v4.57.1

v4.57.0: Qwen3-Next, Vault Gemma, Qwen3 VL, LongCat Flash, Flex OLMO, LFM2 VL, BLT, Qwen3 OMNI MoE, Parakeet, EdgeTAM, OLMO3

New model additions

Qwen3 Next

Vault Gemma

✅ Deploy Preview for localai ready!

LocalAI
LocalAI copied to clipboard

Deploy Preview for localai ready!