petals Roadmap (tentative)

Roadmap (tentative)

Open justheuristic opened this issue 2 years ago • 4 comments

Current tasks:

[ ] prototype bloom points system @borzunov (#6 )
[x] local tensor parallelism ( #143 , using BlackSamorez/tesnor_parallel by @blacksamorez and @IaroslavLisniak )
[x] increase default max sequence length (from #146 )
[x] allow running a server without open ports @Vahe1994
[ ] option to download pre-quantized blocks (@mryab )
[x] improved routing (@justheuristic )
[x] newest-latest libp2p - @Vahe1994
[ ] touch up fine-tuning examples, make sure they work in reasonable time ( @justheuristic )
[ ] a way to temporarily shutdown petals server
- suggested by @craffel : when running a petals server on a machine that is often in use, people should be able to shut off petals servers while running their experiments
- suggested behavior: shut down asap, restart once gpus are not in use for T minutes
Wanna contribute?
- go to our discord server and ask around!
- always in demand:
  - contribute examples (recommended but not required: create an issue / draft first, before you code them)
  - OS support / hardware support ( e.g. see #147 )
  - more models: OPT-175B, switch-XXL, whatever comes into fashion
  - host a server! (see README)

End of december: cover more use cases

[x] tutorial: generation notebook
[x] tutorial: prompt-tuning notebook
[x] PreLayer prompt-tuning - mentioned as one of the baselines in https://arxiv.org/abs/2106.09685 - DONE
[x] inference with prompt-tuning ( https://github.com/learning-at-home/bloom-demo/issues/13 by @artek0chumak)
[x] advanced inference: beam search, constraints/fusion, LoRA/AdaMix ( @artek0chumak , https://github.com/learning-at-home/bloom-demo/issues/13 )
[x] some kind of hub for tutorials, e.g. a minimalistic website
[x] alpha test: let more people play with 176B model (where? no-brainer: bigscience, stability, discord)
[x] rich inference interface for designing custom generation algorithms (by: @artek0chumak )
[x] let servers run requests with different priorities ( https://github.com/learning-at-home/bloom-demo/issues/8 by: @greenfatguy )
By this point, we must answer the main questions: (1) will people use it? (2a) what for? (2b) why not?

End of ~july~ august: make it reliable, test with early adopters

[x] make it so that servers cannot be killed by a bad client request ( https://github.com/learning-at-home/hivemind/issues/3 by: @justheuristic)
[x] find the best way to reduce the size of 176B model ( https://github.com/learning-at-home/bloom-demo/issues/4 by: @TimDettmers )
[x] let servers automatically find and serve the most in-demand layers ( @borzunov )
[x] implement popular non-beam inference types ( https://github.com/learning-at-home/bloom-demo/issues/13 by @artek0chumak )
[x] compress the activations sent between client and server nodes ( by: @mryab )
[x] find enough hardware to run the 176B model ( https://github.com/learning-at-home/bloom-demo/issues/14 by: @justheuristic )
[x] pre-alpha test: once it is stable enough, let some (trusted) folks play with it and get their feedback
[x] submit a EMNLP system demonstration proposal ( https://2022.emnlp.org/calls/System_Demonstrations/ )
begin investigating: tutorials, documentation, examples

End of june: build a proof-of-concept

[x] agree on the user interface (see https://github.com/learning-at-home/bloom-demo/issues/5#issuecomment-1159776494 )
[x] run simple (but correct!) inference with a smaller model (for generation)
[x] do simple (but correct!) forward/backward with frozen layers (for prompt tuning)
[x] client can dynamically choose which remote servers to use for inference ( by: @justheuristic )
[x] create basic correctness tests for later
[x] check if 8-bit compression is remotely feasible ( by: @TimDettmers )
it's okay if the code is not super reliable for now
it's okay if servers have to be set up manually for now
begin investigating: quantized weights, quantized communication, automatic server allocation, "bloom points"

Important, but not urgent:

multiplicative adapters from https://wandb.ai/learning-at-home/LM_OWT/reports/Parameter-sharing-revisited--VmlldzoxOTAxNjcx ?
non-critical performance improvements ( #11 )
better finetuning methods: LoRA, AdaMix, PreLayer (see LoRA/AdaMix), whatever is SoTA at the time of building
fully decentralized point system

Jun 20 '22 14:06 justheuristic