text-generation-inference
text-generation-inference copied to clipboard
[Meta] Road to 1.0 checklist
CI
- [x] Add custom multi GPU runner to CI
- [x] test docker image on MR
- [x] load test on daily cron (low prio)
server
- [x] tests for every model class
- [ ] forward + batching / filter
- [ ] with/without quantization
router
- [ ] unit-tests for infer and validation modules
launcher
- [x] refactor a tad as the code is nasty (https://github.com/huggingface/text-generation-inference/pull/242)
test-suite
- [x] write integration test suite base on Nick’s comment
misc
- [x] add issue templates + env capture script https://github.com/huggingface/text-generation-inference/pull/264
- [x] end debate on license
@Narsil Given my personal experience I would strongly suggest test suites that exercise this libraries usage in cloud-native kubernetes offerings ie. AKS, EKS, and GKE. I have run into issues standing this service up across multiple clouds and think all gotchas could be captured in E2E tests running on kubernetes in cloud environments since for those who want to run this in real production situations kubernetes will likely be the tool of choice for container orchestration.
Can you ellaborate on the issues you faced while running text-generation-inference?
We have been running it since october in EKS, AzureML (which uses k8s as a backend) and K3S and it worked out of the box.
@OlivierDehaene For EKS single gpu nodes I did need some customization via the bottlerocket OS. Maybe the G5 12XL comes with the right configs out of the box but the G5 4XL certainly does not. This would be something that e2e testing could clarify. https://github.com/huggingface/text-generation-inference/issues/206
Also for AKS clusters we got the basic setup with quantization working but still do not seem to be able to run in sharded mode because of the NCCL issues from other tickets. Again something where tests (even with with expected failures) could help illuminate realities of running this software in prod. https://github.com/huggingface/text-generation-inference/issues/230
It would also be great to get some documentation around how to tune dynamic batching. I have been reading a bit of the code here but it is a bit complicated (probably because it is doing hard things). It wasn't clear to me from the documentation that dynamic batches were actually selected token-wise and how I might tune my dynamic batching launch arguments depending on the max token size.
@sam-h-bean, we can put you in contact with our experts in our Expert Acceleration Program if you need any help setting up text-generation-inference in your production environment.
Hello,
Would like to know if contrastive search is in the priority list.
It's not part of 1.0.
@Narsil Has any one picked up router tests tasks ? If no, I would be interested to contribute.
@chandrasekharpatra Please feel free to take a stab at it.
We didn't actually do it for 1.0 because it actually didn't seem that important for finding bugs/regressions. Writing the test code was relatively messy, but if we find a clean implementation that would be awesome.
Okay, will have a look at missing server tests as well.
Don't hesitate to open a small PR first in order to get feedback. Smaller PRs are better and more easily reviewed therefore more easily approved.