specification icon indicating copy to clipboard operation
specification copied to clipboard

Add counts and observations as needed

Open mcupak opened this issue 8 years ago • 5 comments

Currently, we have the following at the dataset response level. Review if this is enough and add information as needed:

  // Frequency of this allele in the dataset. Between 0 and 1, inclusive.
  double frequency = 4;

  // Number of variants matching the allele request in the dataset.
  int64 variant_count = 5;

  // Number of calls matching the allele request in the dataset.
  int64 call_count = 6;

  // Number of samples matching the allele request in the dataset.
  int64 sample_count = 7;

mcupak avatar Jul 04 '17 03:07 mcupak

@mfiume WDYT?

mcupak avatar Jul 04 '17 03:07 mcupak

@mcupak I would prefer to use "biosample_count", which would go along with the GA4GH (and general) "biosample" concept. This corresponds to the most relevant question (does this biological sample - tumor tissue, germline DNA, environmental sample - contain "DNA sequence nnn".

"sample" is less well defined; e.g. could refer to technical replicate etc. This is covered by "call_count" (though, actually, may better or additionally be "callset_count").

An extended representation would be:

  • biosample_count: number of biological material preparations showing a variant
  • callset_count: number of experiments with a variant
  • call_count: number of alleles with an allele
  • variant_count: number of variants with one or more calls matching the allele request

Is this, conceptually, correct? Not sure if we should cover all, but this should be declared & documented.

mbaudis avatar Jul 04 '17 10:07 mbaudis

I don't understand what you mean by observations. Can you @mbaudis or @mcupak clarify?

juhtornr avatar Mar 15 '18 11:03 juhtornr

@juhtornr So my use of "_count" would be incorrect, when using the counts <-> observations concept, in which:

  • count => all records (biosamples, variants, callsets ...)
  • observations => matches

However: In the schema, "count" is used for both types :-(

BeaconDataset.callCount
  integer($int64)
  minimum: 0
  Total number of calls in the dataset.
BeaconDatasetAlleleResponse.callCount
  integer($int64)
  minimum: 0
  Number of calls matching the allele request in the dataset.

mbaudis avatar Mar 15 '18 11:03 mbaudis

@antbro I guess the original was yours, could you bring your views here, please?

jrambla avatar Aug 05 '18 15:08 jrambla