Thien Tran

Results 55 issues of Thien Tran

Quant-LLM code: https://github.com/pytorch/ao/tree/main/torchao/csrc/cuda/fp6_llm Currently Quant-LLM kernel (backing FPx in torchao) only works with FP16. This creates a small divergence from other quantization methods, which all work with BF16. Since all...

enhancement
good first issue
inference

In `optim.load_state_dict(state_dict)`, if optim dtype != state_dict dtype, `aten._to_copy.default` is called. This PR simply implements this op and add appropriate tests. **Update**: In PyTorch pre-2.4, calling `.to(device, dtype)` will not...

CLA Signed

https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training Currently INT8 training recipes only support **row-wise scaling** for weight. This should be strictly better than (or at least the same as) **tensor-wise scaling** for weight in terms of...

enhancement
good first issue
triaged

#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...

CLA Signed

**Steps/Code to reproduce bug** ```python import torch import cutlass.epilogue def epilogue(accum, bias): D = accum + bias return D examples_tensors = dict( accum=torch.randn(1024, 1024), bias=torch.randn(1024, 1).bfloat16(), D=torch.randn(1024, 1024).bfloat16(), ) cutlass.epilogue.trace(epilogue,...

bug
? - Needs Triage
inactive-30d
inactive-90d

Fixes #1824 I was thinking of adding a test case for this, but currently the dtype is hard-coded to FP16 https://github.com/NVIDIA/cutlass/blob/44dae8b90ef232ea663727470dfbbe9daff6972d/test/python/cutlass/evt/utils/evt_testbed.py#L206 Would take some refactoring to test multiple dtypes at...

inactive-30d
inactive-90d

### Feature request Add BetterTransformer support for SEW. SEW has almost identical architecture with Wav2Vec2. In particular, the attention modules are the same. ### Motivation NA ### Your contribution I'm...

feature-request
bettertransformer

## Pull Request Description When `stream=true`, OpenAI API does not require `stream_options` to be specified. This will work ``` curl https://api.openai.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY"...

## Describe Your Changes - Note: remote engine refactor should be merged before this PR TODO: - [ ] Use Ollama API (`/api/chat`) / Ollama client to set context length...

type: feature request

## Describe Your Changes Replace cortex's `/v1/hardware` with rust - [x] Basic hardware info: CPU, OS, RAM usage (Power and Storage are removed since they are not implemented in Cortex,...

type: feature request