Review tokenization for each Document type.

Open danielballan opened this issue 6 years ago • 0 comments

In #489, each document is tokenized into a tuple containing the document type (e.g. 'descriptor') and its UID. We should revisit each type and consider if this is the right balance between full content-based tokenization (very slow but guaranteed not to produce collisions) and...something else.

Jan 12 '20 20:01 danielballan