Thien Tran issues

Results 55 issues of


                                            Thien Tran

Make Quant-LLM compatible with BF16

Quant-LLM code: https://github.com/pytorch/ao/tree/main/torchao/csrc/cuda/fp6_llm Currently Quant-LLM kernel (backing FPx in torchao) only works with FP16. This creates a small divergence from other quantization methods, which all work with BF16. Since all...

enhancement

good first issue

inference

[low-bit optim] Fix load state dict when device is different

In `optim.load_state_dict(state_dict)`, if optim dtype != state_dict dtype, `aten._to_copy.default` is called. This PR simply implements this op and add appropriate tests. **Update**: In PyTorch pre-2.4, calling `.to(device, dtype)` will not...

CLA Signed

Add weight tensor-wise scaling for INT8 quantized and mixed-precision training

https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training Currently INT8 training recipes only support **row-wise scaling** for weight. This should be strictly better than (or at least the same as) **tensor-wise scaling** for weight in terms of...

enhancement

good first issue

triaged

Integrate INT8 mixed-precision from torchao 0.7

#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...

CLA Signed

[BUG] Cutlass python epilogue doesn't work with BF16

**Steps/Code to reproduce bug** ```python import torch import cutlass.epilogue def epilogue(accum, bias): D = accum + bias return D examples_tensors = dict( accum=torch.randn(1024, 1024), bias=torch.randn(1024, 1).bfloat16(), D=torch.randn(1024, 1024).bfloat16(), ) cutlass.epilogue.trace(epilogue,...

bug

? - Needs Triage

inactive-30d

inactive-90d

Add BF16 dtype mapping for python EVT

Fixes #1824 I was thinking of adding a test case for this, but currently the dtype is hard-coded to FP16 https://github.com/NVIDIA/cutlass/blob/44dae8b90ef232ea663727470dfbbe9daff6972d/test/python/cutlass/evt/utils/evt_testbed.py#L206 Would take some refactoring to test multiple dtypes at...

inactive-30d

inactive-90d

Add BetterTransformer support for SEW

### Feature request Add BetterTransformer support for SEW. SEW has almost identical architecture with Wav2Vec2. In particular, the attention modules are the same. ### Motivation NA ### Your contribution I'm...

feature-request

bettertransformer

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway

## Pull Request Description When `stream=true`, OpenAI API does not require `stream_options` to be specified. This will work ``` curl https://api.openai.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY"...

feat: Ollama provider

## Describe Your Changes - Note: remote engine refactor should be merged before this PR TODO: - [ ] Use Ollama API (`/api/chat`) / Ollama client to set context length...

type: feature request

feat: Hardware info replacement for cortex

## Describe Your Changes Replace cortex's `/v1/hardware` with rust - [x] Basic hardware info: CPU, OS, RAM usage (Power and Storage are removed since they are not implemented in Cortex,...

type: feature request