torchchat
torchchat copied to clipboard
Run PyTorch LLMs locally on servers, desktop and mobile
Today users have to set PYTORCH_ENABLE_MPS_FALLBACK=1 when they call torchchat if they want to use _weight_int4pack_mm. Can we set that automatically, from inside the program. This is a crude workaround,...
``` (py311) mikekg@mikekg-mbp torchchat % python export.py --checkpoint-path ${MODEL_PATH} --temperature 0 --quantize '{"linear:int4": {"groupsize": 128}}' --output-pte mode.pte [...] Traceback (most recent call last): File "/Users/mikekg/qops/torchchat/export.py", line 111, in main(args) File...
see https://github.com/pytorch/executorch/issues/3515 documents excessive non-actionable output
### 🚀 The feature, motivation and pitch The OpenAI API support in torchchat is actively in development and will be one of the main entry point for interacting with torchchat...
### 🚀 The feature, motivation and pitch Currently, the CLI arg parsing in torchchat is too general and overzealous; subcommands indirectly access cli args that it doesn't actually use (e.g....
### 🐛 Describe the bug Running `python torchchat.py export stories15M` does not error out, nor generates any export files, though it should have? ```shell % python torchchat.py export stories15M; echo...
### 🐛 Describe the bug I was trying to look for the `model.py` definition https://github.com/pytorch/torchchat/tree/main/build but it wasn't showing up generate.py which is not in builder works fine Can we...
Add support for 1 - asymmetric a8w4dq, basically require to subtract zero from each value before multiplying, so should add a single multiply. This will help accelerate and better handle...
Using `torch==2.4.0.dev20240502` on Apple M2 pro I get following numbers for stories110M + float16 dtype | application | speed (eager) | speed (compile) | | ---- | ---- | ----...