Andrei Panferov issues

Results 10 issues of


                                            Andrei Panferov

Контрибьюторы не в алфавитном порядке

replace_8bit_linear modules_to_not_convert default value fix

# What does this PR do? Fixes the default value of `modules_to_not_convert` of `utils.bitsandbytes.replace_8bit_linear`. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you...

`__CUDA_ARCH__<=8.0` compilation error fix.

`bfloat16` is not supported on T4 and GPU with the same or lower Compute Capability, meaning the kernels will throw an error compiling. This PR isolates the code behind CC...

Add more predefined configs

We could use more predefined configs for better user experience. Models to consider: - [x] bert - [x] gpt2 - [ ] LLaMa - [ ] electra - [ ]...

enhancement

good first issue

Throw errors when trying to wrap models that are not supposed to be wrapped

Some models already rely on devices interactions. I propose we don't wrap them and throw error. Possible examples: * Wrapping a model that is already wrapped * Wrapping an 'accelerate'...

enhancement

later

How to link custom ops?

Hi! I'm trying to integrate some of quantized MatMul C++ kernels into Executorch and I'm having a bad time: the documentation is very vague about what exactly I need to...

# This PR adds support for the Quartet QAT method. The goal of this PR is to integrate inference and training support for the [Quartet QAT method](https://arxiv.org/abs/2505.14669). That would allow...

[Bug]: Attention Mask Ignored in `transformer_engine` Backend with Packed Sequences (Attention Leakage)

## Summary When training `pretrain_gpt.py` with sequence packing enabled (`--reset-position-ids` and `--reset-attention-mask`) and using the `--transformer-impl transformer_engine` backend, the custom block-diagonal attention mask generated by `GPTDataset` is effectively ignored. The...

bug

community-request

Exported Llama 1B transformer with static 128 sequence length tries to allocate 10Gb on iOS18 causing OOM

## 🐞Describing the bug I'm roughly following [this guide](https://machinelearning.apple.com/research/core-ml-on-device-llama) on LLM exporting. I adjusted the input names to be able to use it with this [HF demo](https://github.com/huggingface/swift-chat). I also added...

bug

Andrei Panferov