text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

[Meta] Road to 1.0 checklist

Open Narsil opened this issue 1 year ago • 11 comments

CI

  • [x] Add custom multi GPU runner to CI
  • [x] test docker image on MR
  • [x] load test on daily cron (low prio)

server

  • [x] tests for every model class
    • [ ] forward + batching / filter
    • [ ] with/without quantization

router

  • [ ] unit-tests for infer and validation modules

launcher

  • [x] refactor a tad as the code is nasty (https://github.com/huggingface/text-generation-inference/pull/242)

test-suite

misc

  • [x] add issue templates + env capture script https://github.com/huggingface/text-generation-inference/pull/264
  • [x] end debate on license

Narsil avatar Apr 25 '23 08:04 Narsil

@Narsil Given my personal experience I would strongly suggest test suites that exercise this libraries usage in cloud-native kubernetes offerings ie. AKS, EKS, and GKE. I have run into issues standing this service up across multiple clouds and think all gotchas could be captured in E2E tests running on kubernetes in cloud environments since for those who want to run this in real production situations kubernetes will likely be the tool of choice for container orchestration.

sam-h-bean avatar Apr 25 '23 20:04 sam-h-bean

Can you ellaborate on the issues you faced while running text-generation-inference?

We have been running it since october in EKS, AzureML (which uses k8s as a backend) and K3S and it worked out of the box.

OlivierDehaene avatar Apr 25 '23 22:04 OlivierDehaene

@OlivierDehaene For EKS single gpu nodes I did need some customization via the bottlerocket OS. Maybe the G5 12XL comes with the right configs out of the box but the G5 4XL certainly does not. This would be something that e2e testing could clarify. https://github.com/huggingface/text-generation-inference/issues/206

Also for AKS clusters we got the basic setup with quantization working but still do not seem to be able to run in sharded mode because of the NCCL issues from other tickets. Again something where tests (even with with expected failures) could help illuminate realities of running this software in prod. https://github.com/huggingface/text-generation-inference/issues/230

sam-h-bean avatar Apr 28 '23 16:04 sam-h-bean

It would also be great to get some documentation around how to tune dynamic batching. I have been reading a bit of the code here but it is a bit complicated (probably because it is doing hard things). It wasn't clear to me from the documentation that dynamic batches were actually selected token-wise and how I might tune my dynamic batching launch arguments depending on the max token size.

sam-h-bean avatar Apr 28 '23 16:04 sam-h-bean

@sam-h-bean, we can put you in contact with our experts in our Expert Acceleration Program if you need any help setting up text-generation-inference in your production environment.

OlivierDehaene avatar Apr 28 '23 17:04 OlivierDehaene

Hello,

Would like to know if contrastive search is in the priority list.

gsaivinay avatar Jun 06 '23 21:06 gsaivinay

It's not part of 1.0.

Narsil avatar Jun 07 '23 06:06 Narsil

@Narsil Has any one picked up router tests tasks ? If no, I would be interested to contribute.

chandrasekharpatra avatar Aug 11 '23 06:08 chandrasekharpatra

@chandrasekharpatra Please feel free to take a stab at it.

We didn't actually do it for 1.0 because it actually didn't seem that important for finding bugs/regressions. Writing the test code was relatively messy, but if we find a clean implementation that would be awesome.

Narsil avatar Aug 11 '23 13:08 Narsil

Okay, will have a look at missing server tests as well.

chandrasekharpatra avatar Aug 11 '23 15:08 chandrasekharpatra

Don't hesitate to open a small PR first in order to get feedback. Smaller PRs are better and more easily reviewed therefore more easily approved.

Narsil avatar Aug 11 '23 15:08 Narsil