sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Is there a diference between variables `dosage` and `call_dosage`

Open timothymillar opened this issue 1 year ago • 4 comments

Are these duplicate variables? call_dosage seems to be more explicit with ndim=2 and the associated call_dosage_mask but isn't currently being used. Variable dosage doesn't specify dimensions and is used for LD pruning.

timothymillar avatar Aug 02 '22 04:08 timothymillar

@eric-czech you're probably be best positioned to answer this one, would you mind taking a look?

jeromekelleher avatar Aug 02 '22 10:08 jeromekelleher

Hm I can't remember or think of a good reason why the dosage variable needs to exist separately from call_dosage. +1 to switching references of dosage to call_dosages like in regenie or the bgen reader.

eric-czech avatar Aug 02 '22 11:08 eric-czech

Thanks @eric-czech, I missed the uses in regenie and the bgen reader. I can have a look at merging these variables when I get a chance. A couple of related questions:

  • Would it make sense to replace the constants defined here with their equivalent variables from variables.py?
  • Would it be worth having the dosage argument of regenie default to variables.call_dosage, or is this best left as an explicit choice?

timothymillar avatar Aug 02 '22 21:08 timothymillar

Would it make sense to replace the constants defined here with their equivalent variables from variables.py?

Yep, I don't see why not. I believe I added that well before variables.py existed so it should use that convention instead.

Would it be worth having the dosage argument of regenie default to variables.call_dosage, or is this best left as an explicit choice?

I think I'm very mildly opposed to that since it wouldn't be uncommon to provide an array with imputed dosages or some other transformation that might get ignored if a user forgets to pass the new variable name, but I can see it either way.

eric-czech avatar Aug 03 '22 18:08 eric-czech