Add counts and observations as needed
Currently, we have the following at the dataset response level. Review if this is enough and add information as needed:
// Frequency of this allele in the dataset. Between 0 and 1, inclusive.
double frequency = 4;
// Number of variants matching the allele request in the dataset.
int64 variant_count = 5;
// Number of calls matching the allele request in the dataset.
int64 call_count = 6;
// Number of samples matching the allele request in the dataset.
int64 sample_count = 7;
@mfiume WDYT?
@mcupak I would prefer to use "biosample_count", which would go along with the GA4GH (and general) "biosample" concept. This corresponds to the most relevant question (does this biological sample - tumor tissue, germline DNA, environmental sample - contain "DNA sequence nnn".
"sample" is less well defined; e.g. could refer to technical replicate etc. This is covered by "call_count" (though, actually, may better or additionally be "callset_count").
An extended representation would be:
biosample_count: number of biological material preparations showing a variantcallset_count: number of experiments with a variantcall_count: number of alleles with an allelevariant_count: number of variants with one or more calls matching the allele request
Is this, conceptually, correct? Not sure if we should cover all, but this should be declared & documented.
I don't understand what you mean by observations. Can you @mbaudis or @mcupak clarify?
@juhtornr So my use of "_count" would be incorrect, when using the counts <-> observations concept, in which:
- count => all records (biosamples, variants, callsets ...)
- observations => matches
However: In the schema, "count" is used for both types :-(
BeaconDataset.callCount
integer($int64)
minimum: 0
Total number of calls in the dataset.
BeaconDatasetAlleleResponse.callCount
integer($int64)
minimum: 0
Number of calls matching the allele request in the dataset.
@antbro I guess the original was yours, could you bring your views here, please?