databroker
databroker copied to clipboard
Review tokenization for each Document type.
In #489, each document is tokenized into a tuple containing the document type (e.g. 'descriptor') and its UID. We should revisit each type and consider if this is the right balance between full content-based tokenization (very slow but guaranteed not to produce collisions) and...something else.