python-docs-samples
python-docs-samples copied to clipboard
chore(deps): update dependency transformers to v4.36.0 [security]
This PR contains the following updates:
| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| transformers | ==4.30.2 -> ==4.36.0 |
GitHub Vulnerability Alerts
CVE-2023-7018
Deserialization of Untrusted Data in GitHub repository huggingface/transformers prior to 4.36.
CVE-2023-6730
Deserialization of Untrusted Data in GitHub repository huggingface/transformers prior to 4.36.0.
Release Notes
huggingface/transformers (transformers)
v4.36.0: v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support
New model additions
Mixtral
Mixtral is the new open-source model from Mistral AI announced by the blogpost Mixtral of Experts. The model has been proven to have comparable capabilities to Chat-GPT according to the benchmark results shared on the release blogpost.
The architecture is a sparse Mixture of Experts with Top-2 routing strategy, similar as NllbMoe architecture in transformers. You can use it through AutoModelForCausalLM interface:
>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B", torch_dtype=torch.float16, device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-8x7B")
>>> prompt = "My favourite condiment is"
>>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
>>> model.to(device)
>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
>>> tokenizer.batch_decode(generated_ids)[0]
The model is compatible with existing optimisation tools such Flash Attention 2, bitsandbytes and PEFT library. The checkpoints are release under mistralai organisation on the Hugging Face Hub.
Llava / BakLlava
Llava is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. In other words, it is an multi-modal version of LLMs fine-tuned for chat / instructions.
The Llava model was proposed in Improved Baselines with Visual Instruction Tuning by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee.
- [
Llava] Add Llava to transformers by @younesbelkada in #27662 - [LLaVa] Some improvements by @NielsRogge in #27895
The integration also includes BakLlava which is a Llava model trained with Mistral backbone.
The mode is compatible with "image-to-text" pipeline:
from transformers import pipeline
from PIL import Image
import requests
model_id = "llava-hf/llava-1.5-7b-hf"
pipe = pipeline("image-to-text", model=model_id)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "USER: <image>\nWhat does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud\nASSISTANT:"
outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
print(outputs)
And you can find all Llava weights under llava-hf organisation on the Hub.
SeamlessM4T v2
SeamlessM4T-v2 is a collection of models designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text. It is an improvement on the previous version and was proposed in Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team from Meta AI.
For more details on the differences between v1 and v2, refer to section Difference with SeamlessM4T-v1.
SeamlessM4T enables multiple tasks without relying on separate models:
- Speech-to-speech translation (S2ST)
- Speech-to-text translation (S2TT)
- Text-to-speech translation (T2ST)
- Text-to-text translation (T2TT)
- Automatic speech recognition (ASR)
- Add SeamlessM4T v2 by @ylacombe in #27779
PatchTST
The PatchTST model was proposed in A Time Series is Worth 64 Words: Long-term Forecasting with Transformers by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong and Jayant Kalagnanam.
At a high level, the model vectorizes time series into patches of a given size and encodes the resulting sequence of vectors via a Transformer that then outputs the prediction length forecast via an appropriate head. The model is illustrated in the following figure:
- [Time series] Add PatchTST by @psinthong in #25927
- [Time series] Add PatchTST by @kashif in #27581
PatchTSMixer
The PatchTSMixer model was proposed in TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting by Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong and Jayant Kalagnanam.
PatchTSMixer is a lightweight time-series modeling approach based on the MLP-Mixer architecture. In this HuggingFace implementation, we provide PatchTSMixer’s capabilities to effortlessly facilitate lightweight mixing across patches, channels, and hidden features for effective multivariate time-series modeling. It also supports various attention mechanisms starting from simple gated attention to more complex self-attention blocks that can be customized accordingly. The model can be pretrained and subsequently used for various downstream tasks such as forecasting, classification and regression.
CLVP
The CLVP (Contrastive Language-Voice Pretrained Transformer) model was proposed in Better speech synthesis through scaling by James Betker.
Phi-1/1.5
The Phi-1 model was proposed in Textbooks Are All You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li.
The Phi-1.5 model was proposed in Textbooks Are All You Need II: phi-1.5 technical report by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
TVP
The text-visual prompting (TVP) framework was proposed in the paper Text-Visual Prompting for Efficient 2D Temporal Video Grounding by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding.
This research addresses temporal video grounding (TVG), which is the process of pinpointing the start and end times of specific events in a long video, as described by a text sentence. Text-visual prompting (TVP), is proposed to enhance TVG. TVP involves integrating specially designed patterns, known as ‘prompts’, into both the visual (image-based) and textual (word-based) input components of a TVG model. These prompts provide additional spatial-temporal context, improving the model’s ability to accurately determine event timings in the video. The approach employs 2D visual inputs in place of 3D ones. Although 3D inputs offer more spatial-temporal detail, they are also more time-consuming to process. The use of 2D inputs with the prompting method aims to provide similar levels of context and accuracy more efficiently.
- TVP model by @jiqing-feng in #25856
DINOv2 depth estimation
Depth estimation is added to the DINO v2 implementation.
- Add DINOv2 depth estimation by @NielsRogge in #26092
ROCm support for AMD GPUs
AMD's ROCm GPU architecture is now supported across the board and fully tested in our CI with MI210/MI250 GPUs. We further enable specific hardware acceleration for ROCm in Transformers, such as Flash Attention 2, GPTQ quantization and DeepSpeed.
- Add RoCm scheduled CI & upgrade RoCm CI to PyTorch 2.1 by @fxmarty in #26940
- Flash Attention 2 support for RoCm by @fxmarty in #27611
- Reflect RoCm support in the documentation by @fxmarty in #27636
- restructure AMD scheduled CI by @ydshieh in #27743
PyTorch scaled_dot_product_attention native support
PyTorch's torch.nn.functional.scaled_dot_product_attention operator is now supported in the most-used Transformers models and used by default when using torch>=2.1.1, allowing to dispatch on memory-efficient attention and Flash Attention backend implementations with no other package than torch required. This should significantly speed up attention computation on hardware that that supports these fastpath.
While Transformers automatically handles the dispatch to use SDPA when available, it is possible to force the usage of a given attention implementation ("eager" being the manual implementation, where each operation is implemented step by step):
### or `attn_implementation="sdpa", or `attn_implementation="flash_attention_2"`
model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-tiny", attn_implementation="eager")
Training benchmark, run on A100-SXM4-80GB.
| Model | Batch size | Sequence length | Time per batch ("eager", s) |
Time per batch ("sdpa", s) |
Speedup | Peak memory ("eager", MB) |
Peak memory ("sdpa", MB) |
Memory savings |
|---|---|---|---|---|---|---|---|---|
| llama2 7b | 4 | 1024 | 1.065 | 0.90 | 19.4% | 73878.28 | 45977.81 | 60.7% |
| llama2 7b | 4 | 2048 | OOM | 1.87 | / | OOM | 78394.58 | SDPA does not OOM |
| llama2 7b | 1 | 2048 | 0.64 | 0.48 | 32.0% | 55557.01 | 29795.63 | 86.4% |
| llama2 7b | 1 | 3072 | OOM | 0.75 | / | OOM | 37916.08 | SDPA does not OOM |
| llama2 7b | 1 | 4096 | OOM | 1.03 | / | OOM | 46028.14 | SDPA does not OOM |
| llama2 7b | 2 | 4096 | OOM | 2.05 | / | OOM | 78428.14 | SDPA does not OOM |
Inference benchmark, run on A100-SXM4-80GB.
| Model | Batch size | Prompt length | Num new tokens | Per token latency "eager" (ms) |
Per token latency "sdpa" (ms) |
Speedup |
|---|---|---|---|---|---|---|
| llama2 13b | 1 | 1024 | 1 (prefill) | 178.66 | 159.36 | 12.11% |
| llama2 13b | 1 | 100 | 100 | 40.35 | 37.62 | 7.28% |
| llama2 13b | 8 | 100 | 100 | 40.55 | 38.06 | 6.53% |
| Whisper v3 large | 1 | / | 62 | 20.05 | 18.90 | 6.10% |
| Whisper v3 large | 8 | / | 77 | 25.42 | 24.77 | 2.59% |
| Whisper v3 large | 16 | / | 77 | 28.51 | 26.32 | 8.34% |
New Cache abstraction & Attention Sinks support
We are rolling out a new abstraction for the past_key_values cache, which enables the use of different types of caches. For now, only llama and llama-inspired architectures (mistral, persimmon, phi) support it, with other architectures scheduled to have support in the next release. By default, a growing cache (DynamicCache) is used, which preserves the existing behavior.
This release also includes a new SinkCache cache, which implements the Attention Sinks paper. With SinkCache, the model is able to continue generating high-quality text well beyond its training sequence length! Note that it does not expand the context window, so it can’t digest very long inputs — it is suited for streaming applications such as multi-round dialogues. Check this colab for an example.
- Generate: New
Cacheabstraction and Attention Sinks support by @tomaarsen in #26681 - Generate: SinkCache can handle iterative prompts by @gante in #27907
Safetensors as a default
We continue toggling features enabling safetensors as a default across the board, in PyTorch, Flax, and TensorFlow.
When using PyTorch model and forcing the load of safetensors file with use_safetensors=True, if the repository does not contain the safetensors files, they will now be converted on-the-fly server-side.
- Default to msgpack for safetensors by @LysandreJik in #27460
- Fix
from_ptflag when loading withsafetensorsby @LysandreJik in #27394 - Make using safetensors files automated. by @Narsil in #27571
Breaking changes
pickle files
We now disallow the use of pickle.load internally for security purposes. To circumvent this, you can use the TRUST_REMOTE_CODE=True command to indicate that you would still like to load it.
Beam score calculation for decoder-only models
In the previous implementation of beam search, when length_penalty is active, the beam score for decoder-only models was penalized by the total length of both prompt and generated sequence. However, the length of prompt should not be included in the penalization step -- this release fixes it.
Slight API changes/corrections
- ⚠️ [VitDet] Fix test by @NielsRogge in #27832
- [⚠️ removed a default argument] Make
AttentionMaskConvertercompatible withtorch.compile(..., fullgraph=True)by @fxmarty in #27868
Bugfixes and improvements
- Enrich TTS pipeline parameters naming by @ylacombe in #26473
- translate peft.md to chinese by @jiaqiw09 in #27215
- Removed the redundant SiLUActivation class. by @hi-sushanta in #27136
- Fixed base model class name extraction from PeftModels by @kkteru in #27162
- Fuyu protection by @LysandreJik in #27248
- Refactor: Use Llama RoPE implementation for Falcon by @tomaarsen in #26933
- [
PEFT/Tests] Fix peft integration failing tests by @younesbelkada in #27258 - Avoid many failing tests in doctesting by @ydshieh in #27262
- [docs] Custom model doc update by @MKhalusova in #27213
- Update the ConversationalPipeline docstring for chat templates by @Rocketknight1 in #27250
- Fix switch transformer mixed precision issue by @timlee0212 in #27220
- [
Docs/SAM] Reflect correct changes to run inference without OOM by @younesbelkada in #27268 - [Docs] Model_doc structure/clarity improvements by @MKhalusova in #26876
- [
FA2] Add flash attention for forDistilBertby @susnato in #26489 - translate autoclass_tutorial to chinese by @jiaqiw09 in #27269
- translate run_scripts.md to chinese by @jiaqiw09 in #27246
- Fix tokenizer export for LLamaTokenizerFast by @mayank31398 in #27222
- Fix daily CI image build by @ydshieh in #27307
- Update doctest workflow file by @ydshieh in #27306
- Remove an unexpected argument for FlaxResNetBasicLayerCollection by @pingzhili in #27272
- enable memory tracker metrics for npu by @statelesshz in #27280
- [
PretrainedTokenizer] add some of the most important functions to the doc by @ArthurZucker in #27313 - Update sequence_classification.md by @akshayvkt in #27281
- Fix VideoMAEforPretrained dtype error by @ikergarcia1996 in #27296
- Fix
Kosmos2Processorbatch mode by @ydshieh in #27323 - [docs] fixed links with 404 by @MKhalusova in #27327
- [Whisper] Block language/task args for English-only by @sanchit-gandhi in #27322
- Fix autoawq docker image by @younesbelkada in #27339
- Generate: skip tests on unsupported models instead of passing by @gante in #27265
- Fix Whisper Conversion Script: Correct decoder_attention_heads and _download function by @zuazo in #26834
- [
FA2] Add flash attention forGPT-Neoby @susnato in #26486 - [
Whisper] Add conversion script for the tokenizer by @ArthurZucker in #27338 - Remove a redundant variable. by @hi-sushanta in #27288
- Resolve AttributeError by utilizing device calculation at the start of the forward function by @folbaeni in #27347
- Remove padding_masks from
gpt_bigcode. by @susnato in #27348 - [
Whisper] Nit converting the tokenizer by @ArthurZucker in #27349 - FIx Bark batching feature by @ylacombe in #27271
- Allow scheduler parameters by @Plemeur in #26480
- translate the en tokenizer_summary.md to Chinese by @ZouJiu1 in #27291
- translate model_sharing.md and llm_tutorial.md to chinese by @jiaqiw09 in #27283
- Add numpy alternative to FE using torchaudio by @ylacombe in #26339
- moving example of benchmarking to legacy dir by @statelesshz in #27337
- Fix example tests from failing by @muellerzr in #27353
- Fix
Kosmos-2device issue by @ydshieh in #27346 - MusicGen Update by @sanchit-gandhi in #27084
- Translate index.md to Turkish by @mertyyanik in #27093
- Remove unused param from example script tests by @muellerzr in #27354
- [Flax Whisper] large-v3 compatibility by @sanchit-gandhi in #27360
- Fix tiny model script: not using
from_pt=Trueby @ydshieh in #27372 - translate big_models.md and performance.md to chinese by @jiaqiw09 in #27334
- Add Flash Attention 2 support to Bark by @ylacombe in #27364
- Update deprecated
torch.rangeintest_modeling_ibert.pyby @kit1980 in #27355 - translate debugging.md to chinese by @jiaqiw09 in #27374
- Smangrul/fix failing ds ci tests by @pacman100 in #27358
- [
CodeLlamaTokenizer] Nit, update init to make sure the AddedTokens are not normalized because they are special by @ArthurZucker in #27359 - Change thresh in test by @muellerzr in #27378
- Put doctest options back to
pyproject.tomlby @ydshieh in #27366 - Skip failing cache call tests by @amyeroberts in #27393
- device-agnostic deepspeed testing by @statelesshz in #27342
- Adds dvclive callback by @dberenbaum in #27352
- use
pytest.markdirectly by @ydshieh in #27390 - Fix fuyu checkpoint repo in
FuyuConfigby @ydshieh in #27399 - Use editable install for git deps by @muellerzr in #27404
- Final fix of the accelerate installation issue by @ydshieh in #27408
- Fix RequestCounter to make it more future-proof by @Wauplin in #27406
- remove failing tests and clean FE files by @ylacombe in #27414
- Fix
Owlv2checkpoint name and a default value inOwlv2VisionConfigby @ydshieh in #27402 - Run all tests if
circleci/create_circleci_config.pyis modified by @ydshieh in #27413 - add attention_mask and position_ids in assisted model by @jiqing-feng in #26892
- [
Quantization] Add str to enum conversion for AWQ by @younesbelkada in #27320 - update Bark FA2 docs by @ylacombe in #27400
- [
AttentionMaskConverter] ]Fix-mask-inf by @ArthurZucker in #27114 - At most 2 GPUs for CI by @ydshieh in #27435
- Normalize floating point cast by @amyeroberts in #27249
- Make
examples_torch_jobfaster by @ydshieh in #27437 - Fix line ending in
utils/not_doctested.txtby @ydshieh in #27459 - Fix some Wav2Vec2 related models' doctest by @ydshieh in #27462
- Fixed typo in error message by @cmcmaster1 in #27461
- Remove-auth-token by @ArthurZucker in #27060
- [
Llama + Mistral] Add attention dropout by @ArthurZucker in #27315 - OWLv2: bug fix in post_process_object_detection() when using cuda device by @assafbot in #27468
- Fix docstring for
gradient_checkpointing_kwargsby @tomaszcichy98 in #27470 - Install
python-Levenshteinfornougatin CI image by @ydshieh in #27465 - Add version check for Jinja by @Rocketknight1 in #27403
- Fix Falcon tokenizer loading in pipeline by @Rocketknight1 in #27316
- [
AWQ] Addresses TODO for awq tests by @younesbelkada in #27467 - Perf torch compile by @jiaqiw09 in #27422
- Fixed typo in pipelines.md documentation by @adismort14 in #27455
- Fix FA2 import + deprecation cycle by @SunMarc in #27330
- [
Peft]modules_to_savesupport for peft integration by @younesbelkada in #27466 - [
CI-test_torch] skiptest_tf_from_pt_safetensorsfor 4 models by @ArthurZucker in #27481 - Fix M4T weights tying by @ylacombe in #27395
- Add speecht5 batch generation and fix wrong attention mask when padding by @Spycsh in #25943
- Clap processor: remove wasteful np.stack operations by @m-bain in #27454
- [Whisper] Fix pipeline test by @sanchit-gandhi in #27442
- Revert "[time series] Add PatchTST by @amyeroberts in #25927)"
- translate hpo_train.md and perf_hardware.md to chinese by @jiaqiw09 in #27431
- Generate: fix
ExponentialDecayLengthPenaltydoctest by @gante in #27485 - Update and reorder docs for chat templates by @Rocketknight1 in #27443
- Generate:
GenerationConfig.from_pretrainedcan return unused kwargs by @gante in #27488 - Minor type annotation fix by @vwxyzjn in #27276
- Have seq2seq just use gather by @muellerzr in #27025
- Update processor mapping for hub snippets by @amyeroberts in #27477
- Track the number of tokens seen to metrics by @muellerzr in #27274
- [
CI-test_torch] skip test_tf_from_pt_safetensors andtest_assisted_decoding_sampleby @ArthurZucker in #27508 - [Fuyu] Add tests by @NielsRogge in #27001
- [Table Transformer] Add Transformers-native checkpoints by @NielsRogge in #26928
- Update spelling mistake by @LimJing7 in #27506
- [
CircleCI] skip test_assisted_decoding_sample for everyone by @ArthurZucker in #27511 - Make some jobs run on the GitHub Actions runners by @ydshieh in #27512
- [
tokenizers] updatetokenizersversion pin by @ArthurZucker in #27494 - [
PretrainedConfig] Improve messaging by @ArthurZucker in #27438 - Fix wav2vec2 params by @muellerzr in #27515
- Translating
en/model_docdocs to Japanese. by @Yuki-Imajuku in #27401 - Fixing the failure of models without max_position_embeddings attribute. by @AdamLouly in #27499
- Incorrect setting for num_beams in translation and summarization examples by @Rocketknight1 in #27519
- Fix bug for T5x to PyTorch convert script with varying encoder and decoder layers by @JamesJiang97 in #27448
- Fix offload disk for loading derivated model checkpoint into base model by @SunMarc in #27253
- translate model.md to chinese by @statelesshz in #27518
- Support ONNX export for causal LM sequence classifiers by @dwyatte in #27450
- [
pytest] Avoid flash attn test marker warning by @ArthurZucker in #27509 - docs: add docs for map, and add num procs to load_dataset by @pphuc25 in #27520
- Update the TF pin for 2.15 by @Rocketknight1 in #27375
- Revert "add attention_mask and position_ids in assisted model" by @patrickvonplaten in #27523
- Set
usedforsecurity=Falsein hashlib methods (FIPS compliance) by @Wauplin in #27483 - Raise error when quantizing a quantized model by @SunMarc in #27500
- Disable docker image build job
latest-pytorch-amdfor now by @ydshieh in #27541 - [
Styling] stylify using ruff by @ArthurZucker in #27144 - Generate: improve assisted generation tests by @gante in #27540
- Updated albert.md doc for ALBERT model by @ENate in #27223
- translate Trainer.md to chinese by @jiaqiw09 in #27527
- Skip some fuyu tests by @ydshieh in #27553
- Fix AMD CI not showing GPU by @ydshieh in #27555
- Generate: fix flaky tests by @gante in #27543
- Generate: update compute transition scores doctest by @gante in #27558
- fixed broken link by @VpkPrasanna in #27560
- Broken links fixed related to datasets docs by @VpkPrasanna in #27569
- translate deepspeed.md to chinese by @jiaqiw09 in #27495
- Fix broken distilbert url by @osanseviero in #27579
- Adding leaky relu in dict ACT2CLS by @rafaelpadilla in #27574
- Fix idx2sym not loaded from pretrained vocab file in Transformer XL by @jtang98 in #27589
- Add
convert_hf_to_openai.pyscript to Whisper documentation resources by @zuazo in #27590 - docs: fix 404 link by @panpan0000 in #27529
- [ examples] fix loading jsonl with load dataset in run translation example by @mathiasesn in #26924
- [
FA-2] Add fa2 support forfrom_configby @younesbelkada in #26914 - timm to pytorch conversion for vit model fix by @staghado in #26908
- [Whisper] Add
large-v3version support by @flyingleafe in #27336 - Update Korean tutorial for using LLMs, and refactor the nested conditional statements in hr_argparser.py by @YeonwooSung in #27489
- Fix torch.fx import issue for torch 1.12 by @amyeroberts in #27570
- dvclive callback: warn instead of fail when logging non-scalars by @dberenbaum in #27608
- [
core/gradient_checkpointing] add support for old GC method by @younesbelkada in #27610 - [ConvNext] Improve backbone by @NielsRogge in #27621
- Generate: Update docs regarding reusing
past_key_valuesingenerateby @gante in #27612 - Idefics: Fix information leak with cross attention gate in modeling by @leot13 in #26839
- Fix flash attention bugs with Mistral and Falcon by @fxmarty in #27625
- Fix tracing dinov2 by @amyeroberts in #27561
- remove the deprecated method
init_git_repoby @statelesshz in #27617 - Explicitely specify
use_cache=Truein Flash Attention tests by @fxmarty in #27635 - Harmonize HF environment variables + other cleaning by @Wauplin in #27564
- Fix
resize_token_embeddingsby @czy-orange in #26861) - [
dependency] update pillow pins by @ArthurZucker in #27409 - Simplify the implementation of jitter noise in moe models by @jiangwangyi in #27643
- Fix
max_stepsdocumentation regarding the end-of-training condition by @qgallouedec in #27624 - [Whisper] Add sequential longform decoding by @patrickvonplaten in #27492
- Add UnivNet Vocoder Model for Tortoise TTS Diffusers Integration by @dg845 in #24799
- update Openai API call method by @Strive-for-excellence in #27628
- update d_kv'annotation in mt5'configuration by @callanwu in #27585
- [
FA2] Add flash attention for opt by @susnato in #26414 - Extended semantic segmentation to image segmentation by @merveenoyan in #27039
- Update TVP arxiv link by @amyeroberts in #27672
- [DPT, Dinov2] Add resources by @NielsRogge in #27655
- Update tiny model summary file by @ydshieh in #27388
- Refactoring Trainer, adds
save_only_modelarg and simplifying FSDP integration by @pacman100 in #27652 - Skip pipeline tests for 2 models for now by @ydshieh in #27687
- Deprecate
TransfoXLby @ydshieh in #27607 - Fix typo in warning message by @liuxueyang in #27055
- Docs/Add conversion code to the musicgen docs by @yoinked-h in #27665
- Fix semantic error in evaluation section by @anihm136 in #27675
- [
DocString] Support a revision in the docstringadd_code_sample_docstringsto facilitate integrations by @ArthurZucker in #27645 - Successfully Resolved The ZeroDivisionError Exception. by @hi-sushanta in #27524
- Fix
TVPModelTestby @ydshieh in #27695 - Fix sliding_window hasattr in Mistral by @IlyaGusev in #27041
- Fix Past CI by @ydshieh in #27696
- fix warning by @ArthurZucker in #27689
- Reorder the code on the Hub to explicit that sharing on the Hub isn't a requirement by @LysandreJik in #27691
- Fix mistral generate for long prompt / response by @lorabit110 in #27548
- Fix oneformer instance segmentation RuntimeError by @yhshin11 in #27725
- fix assisted decoding assistant model inputs by @jiqing-feng in #27503
- Update forward signature test for vision models by @NielsRogge in #27681
- Modify group_sub_entities in TokenClassification Pipeline to support label with "-" by @eshoyuan in #27325
- Fix owlv2 code snippet by @NielsRogge in #27698
- docs: replace torch.distributed.run by torchrun by @panpan0000 in #27528
- Update chat template warnings/guides by @Rocketknight1 in #27634
- translation main-class files to chinese by @jiaqiw09 in #27588
- Translate
en/model_docto JP by @rajveer43 in #27264 - Fixed passing scheduler-specific kwargs via TrainingArguments lr_scheduler_kwargs by @CharbelAD in #27595
- Fix AMD Push CI not triggered by @ydshieh in #27732
- Add BeitBackbone by @NielsRogge in #25952
- Update tiny model creation script by @ydshieh in #27674
- Log a warning in
TransfoXLTokenizer.__init__by @ydshieh in #27721 - Add madlad-400 MT models by @jbochi in #27471
- Enforce pin memory disabling when using cpu only by @qgallouedec in #27745
- Trigger corresponding pipeline tests if
tests/utils/tiny_model_summary.jsonis modified by @ydshieh in #27693 - CLVP Fixes by @susnato in #27547
- Docs: Fix broken cross-references, i.e.
~transformer.->~transformers.by @tomaarsen in #27740 - [docs] Quantization by @stevhliu in #27641
- Fix precision errors from casting rotary parameters to FP16 with AMP by @kevinhu in #27700
- Remove
check_runner_status.ymlby @ydshieh in #27767 - uses dvclive_test mode in examples/pytorch/test_accelerate_examples.py by @dberenbaum in #27763
- Generate:
GenerationConfigthrows an exception whengenerateargs are passed by [@̴
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Never, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
- [ ] If you want to rebase/retry this PR, check this box
This PR has been generated by Mend Renovate. View repository job log here.
@dependabot recreate