Switch split task to token based splitting
Description
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [ ] New or existing tests cover these changes.
- [ ] The documentation is up to date with these changes.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
@drobison00 Thanks for the reviews! I have a couple questions: How do you think we should go about preloading the vocab files in the case that the user doesn't want to allow downloads and in the case they do, how should we go about passing along the huggingface token to access gated models? My thought was to pull it from an environment variable on the client side like we do with unstructured and adobe and pass it along as another parameter in the schema
Also I can't seem to reproduce the test failure locally
Also I can't seem to reproduce the test failure locally
We have a flaky test. I meant to look into fixing it, but for now you can go into Actions and rerun the test.