petals
petals copied to clipboard
Roadmap (tentative)
Current tasks:
- [ ] prototype bloom points system @borzunov (#6 )
- [x] local tensor parallelism ( #143 , using BlackSamorez/tesnor_parallel by @blacksamorez and @IaroslavLisniak )
- [x] increase default max sequence length (from #146 )
- [x] allow running a server without open ports @Vahe1994
- [ ] option to download pre-quantized blocks (@mryab )
- [x] improved routing (@justheuristic )
- [x] newest-latest libp2p - @Vahe1994
- [ ] touch up fine-tuning examples, make sure they work in reasonable time ( @justheuristic )
- [ ] a way to temporarily shutdown petals server
- suggested by @craffel : when running a petals server on a machine that is often in use, people should be able to shut off petals servers while running their experiments
- suggested behavior: shut down asap, restart once gpus are not in use for T minutes
-
Wanna contribute?
- go to our discord server and ask around!
- always in demand:
- contribute examples (recommended but not required: create an issue / draft first, before you code them)
- OS support / hardware support ( e.g. see #147 )
- more models: OPT-175B, switch-XXL, whatever comes into fashion
- host a server! (see README)
End of december: cover more use cases
- [x] tutorial: generation notebook
- [x] tutorial: prompt-tuning notebook
- [x] PreLayer prompt-tuning - mentioned as one of the baselines in https://arxiv.org/abs/2106.09685 - DONE
- [x] inference with prompt-tuning ( https://github.com/learning-at-home/bloom-demo/issues/13 by @artek0chumak)
- [x] advanced inference: beam search, constraints/fusion, LoRA/AdaMix ( @artek0chumak , https://github.com/learning-at-home/bloom-demo/issues/13 )
- [x] some kind of hub for tutorials, e.g. a minimalistic website
- [x] alpha test: let more people play with 176B model (where? no-brainer: bigscience, stability, discord)
- [x] rich inference interface for designing custom generation algorithms (by: @artek0chumak )
- [x] let servers run requests with different priorities ( https://github.com/learning-at-home/bloom-demo/issues/8 by: @greenfatguy )
- By this point, we must answer the main questions: (1) will people use it? (2a) what for? (2b) why not?
End of ~july~ august: make it reliable, test with early adopters
- [x] make it so that servers cannot be killed by a bad client request ( https://github.com/learning-at-home/hivemind/issues/3 by: @justheuristic)
- [x] find the best way to reduce the size of 176B model ( https://github.com/learning-at-home/bloom-demo/issues/4 by: @TimDettmers )
- [x] let servers automatically find and serve the most in-demand layers ( @borzunov )
- [x] implement popular non-beam inference types ( https://github.com/learning-at-home/bloom-demo/issues/13 by @artek0chumak )
- [x] compress the activations sent between client and server nodes ( by: @mryab )
- [x] find enough hardware to run the 176B model ( https://github.com/learning-at-home/bloom-demo/issues/14 by: @justheuristic )
- [x] pre-alpha test: once it is stable enough, let some (trusted) folks play with it and get their feedback
- [x] submit a EMNLP system demonstration proposal ( https://2022.emnlp.org/calls/System_Demonstrations/ )
- begin investigating: tutorials, documentation, examples
End of june: build a proof-of-concept
- [x] agree on the user interface (see https://github.com/learning-at-home/bloom-demo/issues/5#issuecomment-1159776494 )
- [x] run simple (but correct!) inference with a smaller model (for generation)
- [x] do simple (but correct!) forward/backward with frozen layers (for prompt tuning)
- [x] client can dynamically choose which remote servers to use for inference ( by: @justheuristic )
- [x] create basic correctness tests for later
- [x] check if 8-bit compression is remotely feasible ( by: @TimDettmers )
- it's okay if the code is not super reliable for now
- it's okay if servers have to be set up manually for now
- begin investigating: quantized weights, quantized communication, automatic server allocation, "bloom points"
Important, but not urgent:
- multiplicative adapters from https://wandb.ai/learning-at-home/LM_OWT/reports/Parameter-sharing-revisited--VmlldzoxOTAxNjcx ?
- non-critical performance improvements ( #11 )
- better finetuning methods: LoRA, AdaMix, PreLayer (see LoRA/AdaMix), whatever is SoTA at the time of building
- fully decentralized point system