torchchat
torchchat copied to clipboard
Run PyTorch LLMs locally on servers, desktop and mobile
### 🐛 Describe the bug I am using ET and generating the quantized version of the model as shown in the README. ``` python torchchat.py export llama3.1 --quantize config/data/mobile.json --output-pte-path...
Summary: This improves best tokens/sec from 73 to 85.
### 🐛 Describe the bug Run `python3 torchchat.py generate stories110M` on a system with a bad network connection will hang for 90+ sec before it starts generatin anything ### Versions...
### 🚀 The feature, motivation and pitch I believe, this is what ollama's one huge advantage. This can also encourage devs to go test llm which they can run on...
As titled, simple changes as it was jsons `build/known_model_params` => `torchchat/model_params`
**Issue** Inputs aren't set up correctly for .pte files. The input tensors must be static and cannot be reshaped. Currently, running eval will result in this error: ``` python3 torchchat.py...
**Goal: ** Users should be able to select the model from the chat interface and receive a response from that model. **Currently:** we just send the request and take the...
Moves the top level distributed folder into a separate distributed folder within the torchchat umbrella. There are intentionally no code changes outside of the README and script path updates
Added files: - model_dist.py a mirror of model.py with Tensor Parallelism baked in. - dist_run.py toy example of how to run the model in distributed way. Test: ``` torchrun --nproc-per-node...