text-generation-inference [Meta] Road to 1.0 checklist

[Meta] Road to 1.0 checklist

Open Narsil opened this issue 1 year ago • 11 comments

CI

[x] Add custom multi GPU runner to CI
[x] test docker image on MR
[x] load test on daily cron (low prio)

server

[x] tests for every model class
- [ ] forward + batching / filter
- [ ] with/without quantization

router

[ ] unit-tests for infer and validation modules

launcher

[x] refactor a tad as the code is nasty (https://github.com/huggingface/text-generation-inference/pull/242)

test-suite

[x] write integration test suite base on Nick’s comment

misc

[x] add issue templates + env capture script https://github.com/huggingface/text-generation-inference/pull/264
[x] end debate on license

Apr 25 '23 08:04 Narsil

@Narsil Given my personal experience I would strongly suggest test suites that exercise this libraries usage in cloud-native kubernetes offerings ie. AKS, EKS, and GKE. I have run into issues standing this service up across multiple clouds and think all gotchas could be captured in E2E tests running on kubernetes in cloud environments since for those who want to run this in real production situations kubernetes will likely be the tool of choice for container orchestration.

Apr 25 '23 20:04 sam-h-bean

Can you ellaborate on the issues you faced while running text-generation-inference?

We have been running it since october in EKS, AzureML (which uses k8s as a backend) and K3S and it worked out of the box.

Apr 25 '23 22:04 OlivierDehaene

@OlivierDehaene For EKS single gpu nodes I did need some customization via the bottlerocket OS. Maybe the G5 12XL comes with the right configs out of the box but the G5 4XL certainly does not. This would be something that e2e testing could clarify. https://github.com/huggingface/text-generation-inference/issues/206

Also for AKS clusters we got the basic setup with quantization working but still do not seem to be able to run in sharded mode because of the NCCL issues from other tickets. Again something where tests (even with with expected failures) could help illuminate realities of running this software in prod. https://github.com/huggingface/text-generation-inference/issues/230

Apr 28 '23 16:04 sam-h-bean

It would also be great to get some documentation around how to tune dynamic batching. I have been reading a bit of the code here but it is a bit complicated (probably because it is doing hard things). It wasn't clear to me from the documentation that dynamic batches were actually selected token-wise and how I might tune my dynamic batching launch arguments depending on the max token size.

Apr 28 '23 16:04 sam-h-bean

@sam-h-bean, we can put you in contact with our experts in our Expert Acceleration Program if you need any help setting up text-generation-inference in your production environment.

Apr 28 '23 17:04 OlivierDehaene

Hello,

Would like to know if contrastive search is in the priority list.

Jun 06 '23 21:06 gsaivinay

It's not part of 1.0.

Jun 07 '23 06:06 Narsil

@Narsil Has any one picked up router tests tasks ? If no, I would be interested to contribute.

Aug 11 '23 06:08 chandrasekharpatra

@chandrasekharpatra Please feel free to take a stab at it.

We didn't actually do it for 1.0 because it actually didn't seem that important for finding bugs/regressions. Writing the test code was relatively messy, but if we find a clean implementation that would be awesome.

Aug 11 '23 13:08 Narsil

Okay, will have a look at missing server tests as well.

Aug 11 '23 15:08 chandrasekharpatra

Don't hesitate to open a small PR first in order to get feedback. Smaller PRs are better and more easily reviewed therefore more easily approved.

Aug 11 '23 15:08 Narsil

text-generation-inference text-generation-inference copied to clipboard

[Meta] Road to 1.0 checklist

CI

server

router

launcher

test-suite

misc

text-generation-inference
text-generation-inference copied to clipboard