kgx icon indicating copy to clipboard operation
kgx copied to clipboard

Duplication of identifiers in pipe-delimited slot value lists

Open RichardBruskiewich opened this issue 2 years ago • 1 comments

It is (possibly) noted that some fields - e.g. provided_by slot - in KGX sometimes tend to accumulate duplicate (CURIE) identifiers. Rather, such lists should be managed internally as proper sets (without member duplication)?

In particular, we need to check the kgx merge operation for this anomaly, but also, perhaps other contexts.

RichardBruskiewich avatar Feb 22 '22 19:02 RichardBruskiewich

I think this is fixed in : https://github.com/biolink/kgx/pull/408 - making a note to check.

sierra-moxon avatar Aug 16 '22 21:08 sierra-moxon

Do we have a unit test to check this?

@sierra, is the relevant code in https://github.com/biolink/kgx/blob/master/kgx/utils/kgx_utils.py#L831? I'm not sure if this snippet of code avoids duplication in pipe-delimited lists...

RichardBruskiewich avatar Nov 02 '22 19:11 RichardBruskiewich

I applied a fix of the above snippet of code in the List related PR #415

RichardBruskiewich avatar Nov 02 '22 21:11 RichardBruskiewich