transformers issues

🌐 [i18n-KO] Translated `model_doc/barthez.md` to Korean

2

# What does this PR do? Translated the `model_doc/barthez.md` file of the documentation to Korean. Thank you in advance for your review. Part of https://github.com/huggingface/transformers/issues/20179 ## Before reviewing - [x]...

Jwaminju

Accelerate x Trainer issue tracker:

16

A bunch of issues are a bit stale, and @SunMarc + @muellerzr are a bit short on bandwidth! Thus we would love to have community support to solve the following:...

ArthurZucker

Good First Issue

trainer

Good Second Issue

DeepSpeed

Good Difficult Issue

PyTorch FSDP

HACKTOBERFEST-ACCEPTED

Accelerate

ValueError: FalconMambaForCausalLM does not support Flash Attention 2.0 yet

3

I'm facing issues while inferencing while using falcon LLM. The latency is around 20-30 minutes for a specific use case. I want to reduce the time and found that we...

Cshekar24

Usage

Flash Attention

Mamba 2 Multi-GPU errors out on generation with parallel beam search

1

### Observed issue Found out when running multi-gpu slow tests in https://github.com/huggingface/transformers/pull/33560 . Line 479 exactly of the mamba2 modeling file https://github.com/huggingface/transformers/blob/8efc06ee1863bd6e34e8adb7b10901da87c66818/src/transformers/models/mamba2/modeling_mamba2.py#L472-L480 Will raise the following for the test `tests/models/mamba2/test_modeling_mamba2.py::Mamba2ModelTest::test_model_parallel_beam_search`...

molbap

Distributed Training / Models

Generation

Add support for HuBERT batch norm instead of weight norm in pos_conv_emb

1

### Feature request - In the Hubert model convolutional positional encoding - add a support for batch norm instead of weight norm as added in Fairseq at - https://github.com/facebookresearch/fairseq/commit/4db264940f281a6f47558d17387b1455d4abd8d9 -...

gallilmaimon

Feature request

ValueError: The checkpoint you are trying to load has model type `florence2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

14

### System Info Windows 10x64 pytorch version: 2.4.0+cu124 Python 3.11.8 transformers-4.46.0. dev0 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My...

Romanio1997

bug

🌐 [i18n-KO] Translated `model_doc/bartpho.md` to Korean

# What does this PR do? Translated the `model_doc/bartpho.md` file of the documentation to Korean. Thank you in advance for your review. Part of https://github.com/huggingface/transformers/issues/20179 ## Before reviewing - [x]...

Jwaminju

New option called `"best"` for `args.save_strategy`.

6

# What does this PR do? Addresses https://github.com/huggingface/transformers/issues/31626. Adds a new option called `"best"` for `TrainingArguments.save_strategy` which saves the model checkpoint each time a new best performance is achieved. ###...

seanswyi

trainer

Mixtral manual `head_dim`

### Feature request https://github.com/huggingface/transformers/blob/816f4424964c1a1631e303b663fc3d68f731e923/src/transformers/models/mixtral/modeling_mixtral.py#L284 `head_dim` in `mixtral` model is forced to have the value of `hidden_size // num_heads`. However, this it not the case in [`llama` model](https://github.com/huggingface/transformers/blob/e95ea479eebb6e01679907db910b5dc5eb64b3c7/src/transformers/models/llama/modeling_llama.py#L290) or even in...

wavy-jung

Feature request

Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned

13

# What does this PR do? This PR removes the function call `prepare_fa2_from_position_ids` in `flash_attention_forward` as it causes graph break when `torch_compile` flag is turned on in Training [arguments](https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments.torch_compile) to...

Abhishek-TAMU

transformers
transformers copied to clipboard

Metadata

🌐 [i18n-KO] Translated `model_doc/barthez.md` to Korean

Accelerate x Trainer issue tracker:

ValueError: FalconMambaForCausalLM does not support Flash Attention 2.0 yet

Mamba 2 Multi-GPU errors out on generation with parallel beam search

Add support for HuBERT batch norm instead of weight norm in pos_conv_emb

ValueError: The checkpoint you are trying to load has model type `florence2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

🌐 [i18n-KO] Translated `model_doc/bartpho.md` to Korean

New option called `"best"` for `args.save_strategy`.

Mixtral manual `head_dim`

Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned

← Metadata

Owner

Metadata

transformers transformers copied to clipboard

Metadata

← Metadata

Owner

Metadata

transformers
transformers copied to clipboard