NeMo-Curator
NeMo-Curator copied to clipboard
Improved Semantic Deduplication Docs
trafficstars
As I am revisiting the semantic deduplication documentation, there are a few things we should add:
- Documentation of the CLI
- If the user uses
add_idlike we recommend, the id_col_type in the config should be a string.