kgx
kgx copied to clipboard
Duplication of identifiers in pipe-delimited slot value lists
It is (possibly) noted that some fields - e.g. provided_by slot - in KGX sometimes tend to accumulate duplicate (CURIE) identifiers. Rather, such lists should be managed internally as proper sets (without member duplication)?
In particular, we need to check the kgx merge operation for this anomaly, but also, perhaps other contexts.
I think this is fixed in : https://github.com/biolink/kgx/pull/408 - making a note to check.
Do we have a unit test to check this?
@sierra, is the relevant code in https://github.com/biolink/kgx/blob/master/kgx/utils/kgx_utils.py#L831? I'm not sure if this snippet of code avoids duplication in pipe-delimited lists...
I applied a fix of the above snippet of code in the List related PR #415