reach
reach copied to clipboard
Inconsistent use of document_id attribute in fuzzyMatchRefsOperator
This is nitpicky, but it cost me some time today. file_hash/document_id is stored in two places in the dict output by ExtractRefs
: document_id
and metadata['file_hash']
. The information is the same, but both keys are used (albeit one only in logging) in fuzzyMatchRefsOperator
line 64:
logger.info(
'ElasticsearchFuzzyMatcher.match: '
'orig-length=%d doc-id=%s truncated-title=%r',
title_len, reference.get('document_id', "Unknown ID"), title
)
line 113:
'policies': [{
'doc_id': ref_metadata.get("file_hash", None),
...
}]
@ivyleavedtoadflax Is this closed and merged in?
No, I've not fixed it. I wasn't 100% sure if it was intended or not?
No it's not intended, I most likely just missed it in the key name refactor.
@jdu to review and see if this is still an issue