regisss issues

Results 8 issues of


regisss

Add the possibility to quantize MatMul per-tensor when per_channel=True

**Description**: When quantizing a model with `per_channel=True`, we should have the possibility to quantize linear layers in a `per_tensor` way as it does not make sense to quantize them per-feature....

feature:quantization

Replace `-m torch.distributed.run` by `torchrun`

# What does this PR do? This PR replaces occurrences of `-m torch.distributed.launch` (deprecated) and `-m torch.distributed.run` (equivalent) by `torchrun`. More information [here](https://pytorch.org/docs/stable/elastic/run.html). ## Before submitting - [x] This PR...

Save the tokenizer and image preprocessor after training a model with the contrastive image-text example

# What does this PR do? When training a model with the contrastive image-text example, only the model is saved (see [here](https://github.com/huggingface/transformers/blob/88399476c3892435395618ed37993176dbb0de73/examples/pytorch/contrastive-image-text/run_clip.py#L512)). As a consequence, when using the trained model...

Add Gaudi CI for Sentence Transformers

# What does this PR do? As per title. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...

Deprecate `hpu_graphs` arg in `generate`

# What does this PR do? The `hpu_graphs` arg in `generate` has not been used there for a while so deprecating it. ## Before submitting - [ ] This PR...

Error in tests when test_trainer is run before test_trainer_distributed

Unit and integration tests currently needs to be run with `pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py`. If not, for instance with `pytests tests/` , *test_trainer* will be executed before *test_trainer_distributed* and the...

bug

Add FX transformation to replace GELU modules by fused GELU modules

# What does this PR do? This PR enables to replace all GELU modules by `GELUActivation()` if necessary. This relies on Torch FX and on Optimum's FX interface. It will...

Add a utility method to get the memory consumptions for various batch sizes

### Feature request The `GaudiTrainer` class should provide a method that takes a list of batch sizes as argument and returns the memory consumptions on HPU for each batch size....

enhancement