David Corvoysier issues

Results 16 issues of


                                            David Corvoysier

Corrupted outputs with Marlin int4 kernels as parallelization increases

When using MarlinInt4WeightQBitsTensor and its associated optimized gemm kernel, there are issues with the weight/scales/zero-point readback as soon as parallelization increases. The consequence is that output features higher than 128...

bug

help wanted

Accuracy issue when using torch._int_mm on AMD CPUs

When performing quantized matrix multiplication between `int8` weights on an AMD CPU, the results are different than those obtained when running the same operation on CUDA or on an Intel...

Verify extension behaviour in google Colab

@kechan reported compilation failures when using quanto in Google Colab, both on CPU and GPU.

Integrate marlin fp16/bf16-int4/int8 matrix multiplication kernel

Since the introduction of mixed-precision fp16-int4 [MARLIN](https://github.com/IST-DASLab/marlin) (Mixed Auto-Regressive Linear) kernels by IST-DASLab, new mixed-precision MARLIN kernels have been introduced for other data types. In particular, mixed-precision fp16/bf16-int4/int8 kernels have...

enhancement

help wanted

[HuggingFace][Neuronx] Training - Optimum Neuron 0.0.25 - Neuron sdk 2.20.0 - Transformers to 4.43.2

Issue #4307 ### Description This PR creates Hugginface's PyTorch DLC for training on neuron-v2 devices (Trainium). By submitting this pull request, I confirm that my contribution is made under the...

build

huggingface

Size:S

[HuggingFace][Neuronx] DLC for Optimum-neuron 0.0.25 - Neuron SDK 2.20.0 PyTorch 2.1.2 - Transformers 4.43.2

## Overview of DLCs to update _Inference - Neuronx_ Dependencies versions: transformers: 4.43.2 torch: 2.1.2 aws-neuron-sdk: 2.20.0 optimum-neuron: 0.0.25 _Training - Neuronx_ Dependencies versions: transformers: 4.43.2 torch: 2.1.2 aws-neuron-sdk: 2.20.0...