David Corvoysier

Results 16 issues of David Corvoysier

When using MarlinInt4WeightQBitsTensor and its associated optimized gemm kernel, there are issues with the weight/scales/zero-point readback as soon as parallelization increases. The consequence is that output features higher than 128...

bug
help wanted

When performing quantized matrix multiplication between `int8` weights on an AMD CPU, the results are different than those obtained when running the same operation on CUDA or on an Intel...

@kechan reported compilation failures when using quanto in Google Colab, both on CPU and GPU.

Since the introduction of mixed-precision fp16-int4 [MARLIN](https://github.com/IST-DASLab/marlin) (Mixed Auto-Regressive Linear) kernels by IST-DASLab, new mixed-precision MARLIN kernels have been introduced for other data types. In particular, mixed-precision fp16/bf16-int4/int8 kernels have...

enhancement
help wanted

Issue #4307 ### Description This PR creates Hugginface's PyTorch DLC for training on neuron-v2 devices (Trainium). By submitting this pull request, I confirm that my contribution is made under the...

build
huggingface
Size:S

## Overview of DLCs to update _Inference - Neuronx_ Dependencies versions: transformers: 4.43.2 torch: 2.1.2 aws-neuron-sdk: 2.20.0 optimum-neuron: 0.0.25 _Training - Neuronx_ Dependencies versions: transformers: 4.43.2 torch: 2.1.2 aws-neuron-sdk: 2.20.0...