reach icon indicating copy to clipboard operation
reach copied to clipboard

Inconsistent use of document_id attribute in fuzzyMatchRefsOperator

Open ivyleavedtoadflax opened this issue 5 years ago • 4 comments

This is nitpicky, but it cost me some time today. file_hash/document_id is stored in two places in the dict output by ExtractRefs: document_id and metadata['file_hash']. The information is the same, but both keys are used (albeit one only in logging) in fuzzyMatchRefsOperator

line 64:

            logger.info(
                'ElasticsearchFuzzyMatcher.match: '
                'orig-length=%d doc-id=%s truncated-title=%r',
                title_len, reference.get('document_id', "Unknown ID"), title
            )

line 113:

                'policies': [{
                    'doc_id': ref_metadata.get("file_hash", None),
                    ...
                }]

ivyleavedtoadflax avatar Jan 24 '20 00:01 ivyleavedtoadflax

@ivyleavedtoadflax Is this closed and merged in?

jdu avatar Jan 29 '20 09:01 jdu

No, I've not fixed it. I wasn't 100% sure if it was intended or not?

ivyleavedtoadflax avatar Jan 29 '20 13:01 ivyleavedtoadflax

No it's not intended, I most likely just missed it in the key name refactor.

jdu avatar Jan 29 '20 15:01 jdu

@jdu to review and see if this is still an issue

kristinenielsen avatar Feb 26 '20 11:02 kristinenielsen