petals
petals copied to clipboard
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://github.com/bigscience-workshop/petals/issues/389 This works by trying 'should_choose_other_blocks' twice more over 2-3 minutes if it returns a 'true' value.
NB: this pull request makes several drastic changes to the backend, block_functions and pools. It might be better if I walk you through before the review. On a related note,...
Update install instructions to cuda 11.8 - seems safe enough.
This PR relies on https://github.com/TimDettmers/bitsandbytes/pull/159 and makes it possible to call `convert_model` with the int8 data type and later on download the 8-bit checkpoint instead of 16-bit if serving the...
This PR is meant to implement direct server-to-server communication via push messages, similar to ones in rpc_inference. Note 2 self: minimal testing scenario Run a server ```bash python -m petals.cli.run_server...
I saw that on classification tasks, the 'labels' are target values. When using CausalLM model to tune it on QA dataset, the format should be 1) input_ids: Q; labels:A or...
Hi, I am trying to launch a private swarm using a model I have downloaded to a local directory, which I think in theory should work fine. However, it presents...
Tuple index out of range. Something is broken.
Hello, At first, I'm very happy that this project exists. I could try Beluga2 thanks to the community who shares, like I do, small parts of GPU. That's very impressive!...
``` 2023-09-16 10:15:31.031074+00:00Sep 16 10:15:31.030 [INFO] Running Petals 2.2.0 2023-09-16 10:15:31.349212+00:00/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1006: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. 2023-09-16 10:15:31.349257+00:00warnings.warn( 2023-09-16 10:15:33.018599+00:00Downloading (…)lve/main/config.json:...