petals icon indicating copy to clipboard operation
petals copied to clipboard

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Results 92 petals issues
Sort by recently updated
recently updated
newest added

I'm imagining a situation where people can infer for others and make a small profit. In practicality it is similar to, but more useful than, cryptocurrency mining. You would get...

[Grok](https://github.com/xai-org/grok-1) architecture and weights were just released, do Petals support, or is it in plan to support Grok and MOE models? Having a first in class 314B parameter model running...

fixes issue #536 edited calls to mask methods consistently with https://github.com/huggingface/transformers/pull/27086

install script to help the project deployment or for easy use. actually he is alone for windows wsl nvidia config. but feel free to put it in a folder with...

#331 introduced a bug during inference retries that caused this: ```python [INFO] Route found: 0:18 via …1EBzGt [WARN] [petals.client.inference_session.step:327] Caught exception when running inference via RemoteSpanInfo(peer_id=, start=0, end=18, server_info=ServerInfo(state=, throughput=1040.823002928876,...

Revert #251 since it's not needed after #311. This may improve fine-tuning efficiency for medium-sized batches. **TODO:** - [ ] Test it with increasingly larger batches. Watch that we switch...

This PR makes an ~7% optimization of the inference throughput (measured on a single A100-80GB) by merging the query/key/value projections into a single large matrix multiplication. This reduces the overhead...