transformers icon indicating copy to clipboard operation
transformers copied to clipboard

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Results 2036 transformers issues
Sort by recently updated
recently updated
newest added

# What does this PR do? Translated the `model_doc/barthez.md` file of the documentation to Korean. Thank you in advance for your review. Part of https://github.com/huggingface/transformers/issues/20179 ## Before reviewing - [x]...

A bunch of issues are a bit stale, and @SunMarc + @muellerzr are a bit short on bandwidth! Thus we would love to have community support to solve the following:...

Good First Issue
trainer
Good Second Issue
DeepSpeed
Good Difficult Issue
PyTorch FSDP
HACKTOBERFEST-ACCEPTED
Accelerate

I'm facing issues while inferencing while using falcon LLM. The latency is around 20-30 minutes for a specific use case. I want to reduce the time and found that we...

Usage
Flash Attention

### Observed issue Found out when running multi-gpu slow tests in https://github.com/huggingface/transformers/pull/33560 . Line 479 exactly of the mamba2 modeling file https://github.com/huggingface/transformers/blob/8efc06ee1863bd6e34e8adb7b10901da87c66818/src/transformers/models/mamba2/modeling_mamba2.py#L472-L480 Will raise the following for the test `tests/models/mamba2/test_modeling_mamba2.py::Mamba2ModelTest::test_model_parallel_beam_search`...

Distributed Training / Models
Generation

### Feature request - In the Hubert model convolutional positional encoding - add a support for batch norm instead of weight norm as added in Fairseq at - https://github.com/facebookresearch/fairseq/commit/4db264940f281a6f47558d17387b1455d4abd8d9 -...

Feature request

### System Info Windows 10x64 pytorch version: 2.4.0+cu124 Python 3.11.8 transformers-4.46.0. dev0 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My...

bug

# What does this PR do? Translated the `model_doc/bartpho.md` file of the documentation to Korean. Thank you in advance for your review. Part of https://github.com/huggingface/transformers/issues/20179 ## Before reviewing - [x]...

# What does this PR do? Addresses https://github.com/huggingface/transformers/issues/31626. Adds a new option called `"best"` for `TrainingArguments.save_strategy` which saves the model checkpoint each time a new best performance is achieved. ###...

trainer

### Feature request https://github.com/huggingface/transformers/blob/816f4424964c1a1631e303b663fc3d68f731e923/src/transformers/models/mixtral/modeling_mixtral.py#L284 `head_dim` in `mixtral` model is forced to have the value of `hidden_size // num_heads`. However, this it not the case in [`llama` model](https://github.com/huggingface/transformers/blob/e95ea479eebb6e01679907db910b5dc5eb64b3c7/src/transformers/models/llama/modeling_llama.py#L290) or even in...

Feature request

# What does this PR do? This PR removes the function call `prepare_fa2_from_position_ids` in `flash_attention_forward` as it causes graph break when `torch_compile` flag is turned on in Training [arguments](https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments.torch_compile) to...