nv-ingest icon indicating copy to clipboard operation
nv-ingest copied to clipboard

Switch split task to token based splitting

Open ChrisJar opened this issue 1 year ago • 4 comments

Description

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [ ] New or existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

ChrisJar avatar Dec 13 '24 06:12 ChrisJar

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Dec 13 '24 06:12 copy-pr-bot[bot]

@drobison00 Thanks for the reviews! I have a couple questions: How do you think we should go about preloading the vocab files in the case that the user doesn't want to allow downloads and in the case they do, how should we go about passing along the huggingface token to access gated models? My thought was to pull it from an environment variable on the client side like we do with unstructured and adobe and pass it along as another parameter in the schema

ChrisJar avatar Jan 13 '25 23:01 ChrisJar

Also I can't seem to reproduce the test failure locally

ChrisJar avatar Jan 14 '25 18:01 ChrisJar

Also I can't seem to reproduce the test failure locally

We have a flaky test. I meant to look into fixing it, but for now you can go into Actions and rerun the test.

edknv avatar Jan 14 '25 18:01 edknv