vrs-python
vrs-python copied to clipboard
`ga4gh_identify` adds `digest` properties in-place regardless of `in_place` argument
I have a basic allele. If I manually add the VA ID, the compacted JSON dump is 323 chars:
>>> allele = Allele(location={"end": 44908822, "start": 44908821, "sequenceReference": {"id": "NC_0000019.10", "type": "SequenceReference", "refgetAccession": "SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl"}, "type": "SequenceLocation"}, state={"sequence": "T", "type": "LiteralSequenceExpression"}, type="Allele")
>>> allele.id = "ga4gh:VA.0AePZIWZUNsUlQTamyLrjm2HWUw2opLt"
>>> len(allele.model_dump_json(exclude_none=True))
323
If I use ga4gh_identify, though, even if I set in_place
to "never"
, digest
fields are added that makes the final JSON over 400 characters:
>>> allele = Allele(location={"end": 44908822, "start": 44908821, "sequenceReference": {"id": "NC_0000019.10", "type": "SequenceReference", "refgetAccession": "SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl"}, "type": "SequenceLocation"}, state={"sequence": "T", "type": "LiteralSequenceExpression"}, type="Allele")
>>> allele.id = ga4gh_identify(allele, "never")
>>> len(allele.model_dump_json(exclude_none=True))
411
>>> print(allele.model_dump_json(exclude_none=True, indent=2))
{
"id": "ga4gh:VA.0AePZIWZUNsUlQTamyLrjm2HWUw2opLt",
"type": "Allele",
"digest": "0AePZIWZUNsUlQTamyLrjm2HWUw2opLt",
"location": {
"type": "SequenceLocation",
"digest": "wIlaGykfwHIpPY2Fcxtbx4TINbbODFVz",
"sequenceReference": {
"id": "NC_0000019.10",
"type": "SequenceReference",
"refgetAccession": "SQ.IIB53T8CNeJJdUqzn9V_JnRtQadwWCbl"
},
"start": 44908821,
"end": 44908822
},
"state": {
"type": "LiteralSequenceExpression",
"sequence": "T"
}
}
This isn't the end of the world, but an increase in size of ~25% ends up being pretty hefty in extreme cases. is it intentional/necessary for ga4gh_identify
to add these digest
fields in-place, or can that be taken out? Or alternatively, maybe a ga4gh_compact
method could strip unset/redundant fields?