torchchat
torchchat copied to clipboard
Run PyTorch LLMs locally on servers, desktop and mobile
### 🐛 Describe the bug ExecuTorch has a bug right now so we need to default max_seq_length to 128. Once this has been fixed remove the default here and during...
This PR enable llava1.5 on torchchat, which is the first multi-modality model on torchchat. How to play? You can use `--prompt` as the flag for text input, and `--image-prompt` as...
### 🐛 Describe the bug ``` (pt) sunshine@raspberrypi:~/torchchat $ ./install/install_requirements.sh + pip3 install -r install/requirements.txt --extra-index-url https://download.pytorch.org/whl/nightly/cu121 Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple, https://download.pytorch.org/whl/nightly/cu121 Ignoring tomli: markers 'python_version < "3.11"' don't...
### 🐛 Describe the bug [`convert_hf_checkpoint`](https://github.com/pytorch/torchchat/blob/main/torchchat/cli/convert_hf_checkpoint.py#L37) transforms a HF checkpoint into a torchchat format. As part of this process, `ModelArgs` is created for the newly downloaded model. Currently it constructs...
This PR introduced a hook for model checkpoint remapping to remove model.model when model loading for better clearance.
### 🚀 The feature, motivation and pitch This is for aligning distributed's load behavior with single-device's case. Today distributed relies on an index file containing a `param->bin` mapping to limit...
### 🐛 Describe the bug I followed all the instructions in the repo and got to the point of launching the Xcode project, when I hit the "Play" button, I...
### 🐛 Describe the bug ``` torchrun --nproc-per-node 8 dist_run.py ``` ``` known configs: ['13B', '30B', '34B', '70B', '7B', 'CodeLlama-7b-Python-hf', 'Mistral-7B', 'stories110M', 'stories15M', 'stories42M', 'Meta-Llama-3-70B', 'Meta-Llama-3-8B', 'Meta-Llama-3.1-70B-Tune', 'Meta-Llama-3.1-70B', 'Meta-Llama-3.1-8B-Tune', 'Meta-Llama-3.1-8B']...
### 🐛 Describe the bug per @kwen2501 - when we are doing decoding step: ~~~ next_token = torch.tensor([decode_results[0][0]], device=device) ~~~ "nit: I am not sure if the use of torch.tensor...
This allows to run the server and generate chat versions seamlessly