vrs icon indicating copy to clipboard operation
vrs copied to clipboard

Make a better term for Genotype.count

Open ahwagner opened this issue 2 years ago • 8 comments

@larrybabb and @rrfreimuth agree that Genotype.count is not an ideal term due to confusion with GenotypeMember.count

Other terms considered include ploidy and somy but no clear term exists for this measure. Blocking issue for #394.

ahwagner avatar Aug 01 '22 21:08 ahwagner

How about total or total_count?

larrybabb avatar Aug 02 '22 02:08 larrybabb

I think the name needs to include a noun if it does not refer to the class that contains it. For example, member_count. I don’t know that I have a good alternative at this time.

rrfreimuth avatar Aug 02 '22 05:08 rrfreimuth

It's really a count of the occurrences of the molecules in the genome (or system) that contain the genotypeMembers. It could be seen as a "member count" but i think that takes the focus off of the primary definition which is more of the systemic count or a max potential member count.

larrybabb avatar Aug 02 '22 13:08 larrybabb

That's a good clarification, and it exemplifies why I think it is important to put more meaning into the name (which is something I try to minimize). In this case, the class is Genotype but the count refers to molecules in a system, so Genotype.count isn't quite right because it sounds like a count of genotypes. I would offer an idea if I had one, but to be honest this is a very difficult thing to try to convey within this structure and I'm having a little trouble wrapping my head around it. I'm gravitating towards something related to "locus" and "ploidy" but I'm not sure either of those words are quite right.

rrfreimuth avatar Aug 02 '22 14:08 rrfreimuth

homologous_members_at_locus​? Wordy, but descriptive.

ahwagner avatar Aug 02 '22 15:08 ahwagner

i don't think it should have a reference to "members". A single occurrence of a genotypeMember exists on one and only one copy of the molecule that it's parent Genotype is defined by. The total possible copies of the genotype's molecule is the genotype count. Each copy of the molecule may or may not have an explicit GenotypeMember defined in the Genotype definition. These total copies of the molecules for the Genotype are essential in understanding the level of specificity a given Genotype is conveying.

larrybabb avatar Aug 02 '22 18:08 larrybabb

The total possible copies of the genotype's molecule is the genotype count.

I don't quite follow the idea of a "genotype's molecule" (since genotype is a slice across molecules), but I know what you're trying to convey. I don't want to over complicate this, but I think part of the issue is that there are multiple concepts represented by the same structure. Without getting too caught up in the names, the following concepts are part of this object:

  • A SequenceLocation (contextualized interval) or conceptual locus
  • A system (e.g., genome)
  • The notion of physical copies of the location/locus within the system (members)
  • The state of the members (Sequence; note that Allele or Haplotype replicates location information)
  • A shorthand notation condensing replicates of a member with a given state to a count/copy number
  • The total number of physical copies known or asserted to be in the system

Note: The locus isn't specifically defined at the level of Genotype, but must be derived from its members. There may be a need for business rules or implementation guidance to ensure all members have consistent Locations.

Some concepts are attributes of the system, some of a reference molecule, some of observed molecules with homology to the reference. This is where I tend to get tangled up a bit.

rrfreimuth avatar Aug 02 '22 18:08 rrfreimuth

This is a complex object, though I do not think that is the challenge in finding an appropriate term. Genotype is a well-understood concept, and many of the referenced attributes of this object have been previously characterized in VRS. For example, the Haplotype and Allele members capture:

  • A SequenceLocation of a genotype member
  • The state of a genotype member

And the Genotype inherits from Systemic Variation, which describes:

  • A system // in this case the system is not a genome, but a derived genomic locus

So the only new pieces to consider are:

  • A count of the [in-trans] copies of a Molecular Variation at the genomic locus
  • The total [in-trans] homologous molecules at the genomic locus

In my view, the point of having a description for all classes and slots in our schema and documentation is to capture these definitions and keep the variables themselves simple. The challenge for "the total in-trans homologous molecules at the genomic locus" concept is that I don't think there is a way to describe this in one or two words. I still think count is best.

When considering variables such as Allele.state, what it means is only clear in the context of the parent object, by its definition. I am unsure why we are holding the Genotype.count slot to a higher standard than Allele.state, though I am beginning to think this issue might just be a matter of clearer definitions rather than descriptive slot names.

ahwagner avatar Aug 03 '22 15:08 ahwagner

In discussion with @rrfreimuth no better variable name was found and it was agreed that the associated Genotype.count description attribute sufficiently conveys the intent of this field.

ahwagner avatar Sep 13 '22 02:09 ahwagner