torchchat issues

User report that CUDA setup is not using SDPA

1

Supposedly we're not calling into SDPA when running on CUDA. Verify that SDPA is used, and fix if a problem does in fact exist. @malfet and @larryliu0820 have been talking...

mikekgfb

Add dtype tests to runner-et

1

Ensure that we can pass dtype tests for fp16, bf16 (?) and fp32 for Executorch with runner et. bf16 may not yet be a thing, but @malfet 's tests suggest...

mikekgfb

Eval fails for stories15M.pte: E 00:00:05.929516 executorch:method.cpp:820] Error setting input 0: 0x10 (PR #838)

https://github.com/pytorch/torchchat/actions/runs/9163587067/job/25193029015?pr=838 ``` The methods are: {'forward'} + python3 torchchat.py eval stories15M --pte-path stories15M.pte Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. NumExpr defaulting...

mikekgfb

Improve the scope of Model Evaluation to AOTI and ET

### 🚀 The feature, motivation and pitch Currently, model evaluation is a WIP and mostly focused on pure PyTorch and compile. This is planned work to improve PT support and...

Jack-Khuu

Known Gaps

Update packaging in AOTI path

2

Added a `aoti_package` path, dependent on https://github.com/pytorch/pytorch/pull/129895. Follow up will be to delete the `--output-dso-path`. To export, use the `--output-aoti-package-path` to specify a file with a `.pt2` extension. This will...

angelayi

CLA Signed

Add Android test for #491

2

Please add a test for #491, to build model, plus also using the ability to launch android tests from OSS to confirm they work

mikekgfb

requirements setup fails to install/configure triton properly, yielding broken install

3

Running ./install_requirements.sh runs but has this warning: ~~~ WARNING: Skipping triton as it is not installed. ~~~ Which then results in failing when it attemps to locate Triton: ~~~ Successfully...

lessw2020

[FEATURE REQUEST] Asymmetric 4b quantization variant for a8w4dq (ET export, maybe also desktop?)

2

Asymmetric int4 weight variant for a8w4 w/ dynamic quantization? May be a good test run for 80/20 approach?

mikekgfb

Comparing tokens/second with llama-cpp on mac m1

7

I was trying to compare TorchChat's generate tokens/second with llama-cpp's generate on my mac. I am using GGML FP16 with llama-cpp and I believe the default in TorchChat is FP16?...

agunapal

`chat` and `generate` mode 3x+ speed difference

4

`chat` and `generate `for the same model should yield the same number of tokens, shouldn't they? But right now there are more than 3x difference, at least as observed on...

malfet

torchchat
torchchat copied to clipboard

Metadata

User report that CUDA setup is not using SDPA

Add dtype tests to runner-et

Eval fails for stories15M.pte: E 00:00:05.929516 executorch:method.cpp:820] Error setting input 0: 0x10 (PR #838)

Improve the scope of Model Evaluation to AOTI and ET

Update packaging in AOTI path

Add Android test for #491

requirements setup fails to install/configure triton properly, yielding broken install

[FEATURE REQUEST] Asymmetric 4b quantization variant for a8w4dq (ET export, maybe also desktop?)

Comparing tokens/second with llama-cpp on mac m1

`chat` and `generate` mode 3x+ speed difference

← Metadata

Owner

Metadata

torchchat torchchat copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchchat
torchchat copied to clipboard