petals icon indicating copy to clipboard operation
petals copied to clipboard

Roadmap (tentative)

Open justheuristic opened this issue 2 years ago • 4 comments

Current tasks:

  • [ ] prototype bloom points system @borzunov (#6 )
  • [x] local tensor parallelism ( #143 , using BlackSamorez/tesnor_parallel by @blacksamorez and @IaroslavLisniak )
  • [x] increase default max sequence length (from #146 )
  • [x] allow running a server without open ports @Vahe1994
  • [ ] option to download pre-quantized blocks (@mryab )
  • [x] improved routing (@justheuristic )
  • [x] newest-latest libp2p - @Vahe1994
  • [ ] touch up fine-tuning examples, make sure they work in reasonable time ( @justheuristic )
  • [ ] a way to temporarily shutdown petals server
    • suggested by @craffel : when running a petals server on a machine that is often in use, people should be able to shut off petals servers while running their experiments
    • suggested behavior: shut down asap, restart once gpus are not in use for T minutes
  • Wanna contribute?
    • go to our discord server and ask around!
    • always in demand:
      • contribute examples (recommended but not required: create an issue / draft first, before you code them)
      • OS support / hardware support ( e.g. see #147 )
      • more models: OPT-175B, switch-XXL, whatever comes into fashion
      • host a server! (see README)

End of december: cover more use cases

  • [x] tutorial: generation notebook
  • [x] tutorial: prompt-tuning notebook
  • [x] PreLayer prompt-tuning - mentioned as one of the baselines in https://arxiv.org/abs/2106.09685 - DONE
  • [x] inference with prompt-tuning ( https://github.com/learning-at-home/bloom-demo/issues/13 by @artek0chumak)
  • [x] advanced inference: beam search, constraints/fusion, LoRA/AdaMix ( @artek0chumak , https://github.com/learning-at-home/bloom-demo/issues/13 )
  • [x] some kind of hub for tutorials, e.g. a minimalistic website
  • [x] alpha test: let more people play with 176B model (where? no-brainer: bigscience, stability, discord)
  • [x] rich inference interface for designing custom generation algorithms (by: @artek0chumak )
  • [x] let servers run requests with different priorities ( https://github.com/learning-at-home/bloom-demo/issues/8 by: @greenfatguy )
  • By this point, we must answer the main questions: (1) will people use it? (2a) what for? (2b) why not?

End of ~july~ august: make it reliable, test with early adopters

  • [x] make it so that servers cannot be killed by a bad client request ( https://github.com/learning-at-home/hivemind/issues/3 by: @justheuristic)
  • [x] find the best way to reduce the size of 176B model ( https://github.com/learning-at-home/bloom-demo/issues/4 by: @TimDettmers )
  • [x] let servers automatically find and serve the most in-demand layers ( @borzunov )
  • [x] implement popular non-beam inference types ( https://github.com/learning-at-home/bloom-demo/issues/13 by @artek0chumak )
  • [x] compress the activations sent between client and server nodes ( by: @mryab )
  • [x] find enough hardware to run the 176B model ( https://github.com/learning-at-home/bloom-demo/issues/14 by: @justheuristic )
  • [x] pre-alpha test: once it is stable enough, let some (trusted) folks play with it and get their feedback
  • [x] submit a EMNLP system demonstration proposal ( https://2022.emnlp.org/calls/System_Demonstrations/ )
  • begin investigating: tutorials, documentation, examples

End of june: build a proof-of-concept

  • [x] agree on the user interface (see https://github.com/learning-at-home/bloom-demo/issues/5#issuecomment-1159776494 )
  • [x] run simple (but correct!) inference with a smaller model (for generation)
  • [x] do simple (but correct!) forward/backward with frozen layers (for prompt tuning)
  • [x] client can dynamically choose which remote servers to use for inference ( by: @justheuristic )
  • [x] create basic correctness tests for later
  • [x] check if 8-bit compression is remotely feasible ( by: @TimDettmers )
  • it's okay if the code is not super reliable for now
  • it's okay if servers have to be set up manually for now
  • begin investigating: quantized weights, quantized communication, automatic server allocation, "bloom points"

Important, but not urgent:

  • multiplicative adapters from https://wandb.ai/learning-at-home/LM_OWT/reports/Parameter-sharing-revisited--VmlldzoxOTAxNjcx ?
  • non-critical performance improvements ( #11 )
  • better finetuning methods: LoRA, AdaMix, PreLayer (see LoRA/AdaMix), whatever is SoTA at the time of building
  • fully decentralized point system

justheuristic avatar Jun 20 '22 14:06 justheuristic