petals
petals copied to clipboard
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Hi there, does Petals currenly support batch processing/parallel processing? For example, to increase resource usage or system throughput, we would like to see servers parallelly processing multiple prompts at the...
is there a way to donate System memory instead of GPU VRAM? this may become more economical
Good afternoon, I noticed that the model works as if with a pre-configured system prompt, which cannot be changed in any way. Please tell me if there is an opportunity...
Hi there, I've been following this work for a few months and found it's really an amazing idea to run LLMs over the Internet, while I'm also trying to improve...
I am able to run `python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct --num_blocks 2 --max_disk_space=50G` for a bit but it always eventually exits with the an `AssertionError: Span served by this server is...
This update enhances the logging mechanism in the RemoteGenerationMixin class, allowing for better traceability and debugging. The added logging provides insights into key steps of the token generation process without...
This PR improves the speculative generation code by adding more explicit type hints, especially for the `streamer` parameter. Additionally, the code has been refactored for better readability and maintainability. These...
Consider a case that a pre-trained model is only hosted on three servers: the first one hosts blocks 1-4, the second hosts blocks 2-64, and the third hosts blocks 32-128....
I tried to load the local model and ran into this issue Error: > raise ValueError( > ValueError: `rope_scaling` must be a dictionary with with two fields, `type` and `factor`,...
I can't run the falcon 7b instruct model it gives me that error