dolma icon indicating copy to clipboard operation
dolma copied to clipboard

Text modification config

Open rodneykinney opened this issue 1 year ago • 1 comments

Add mixer configuration to trim trailing/leading whitespace from document text, and enforce a minimum document text length. Place these into a new text_modification config object, and move the span_replacements config into it.

@soldni any objections to this backward-incompatible change to config structure?

rodneykinney avatar Oct 19 '23 16:10 rodneykinney

Not sure what's happening with automated tests. Maybe timing out?

make test passes locally, except for the test_download_file Rust test, which also fails on the main branch.

rodneykinney avatar Oct 19 '23 20:10 rodneykinney