Jerome Kelleher

Results 753 comments of Jerome Kelleher

Hmm, I wonder if it's the genotype probability field that's taking up the space. Can you do a ``du -sh`` on the zarr directory please to see what the size...

There we go - I bet we're storing ``call_DS`` as a 32 or 64 bit float which isn't compressing very well.

> I'm still puzzled why gzipped VCF is smaller though I assume these are two digit base-10 values, which compress poorly when converted to float32? This is a common issue...

> Do you think the numcodecs filters like Quantize or Bitround would be suitable, or did you have something else in mind? Yes, these are the types of things that...

This looks very cool @not-a-feature, thanks! I'll need a bit of time to digest, and I think it would probably help to have a call about it. @fbaumdicker - I...

I was thinking more about providing this information as part of the [initial state](https://tskit.dev/msprime/docs/latest/ancestry.html#specifying-the-initial-state), rather than as something extrinsic. So, we provide the "backbone" of the simulation pre-done as some...

I think it would also be interesting to count the number of parent=ultimate ancestor edges (and total span) for samples. I think it's clear that we don't expect many samples...

Hmm, that is interesting. We need to really investigate what's happening here when we order sites by time. Maybe a slightly different hueristic is called for.

Brilliant - if we can characterise these bad haplotypes we can fix them! I'd rather not use mismatch in ancestor matching unless we have to (various reasons)

Can you show a tsqc plot of the ancestor haplotype lengths and edges in the inferred ts here @hyanwong? Should help intuition. Does this use truncate ancestors?