Jerome Kelleher

Results 753 comments of Jerome Kelleher

Let't not go overboard with documenting this yet, the VCF code based on Dask for sgkit has been very problematic and I'm looking into replacing it.

What percentage of nodes/edges would we remove from an inferred ts from real data?

I feel like this is an sgkit bug, as in we should have predictable usage of ``str`` types rather than bytes?

We want to use several different metrics for imputation accuracy here, to avoid overfitting. @szhan can you outline what the different metrics people use are, please?

No rush - I'm just pushing here to be able to refer to it and for discussion when we're interested.

I'm not sure the ancestor generator supports multi allelic anyway, so I guess it's just missing data we need to consider for now

I'm hesitant to use a struct schema to be honest, wouldn't JSON be a a lot simpler? Not sure why we'd set defaults as well as marking them as required?...

Yes, that's true. 27M is quite a few, so the difference would be a few hundred megabytes. All right, I think that's a good justification for going with struct, let's...

This is useful all right, we use something like this in a bunch of places.

Ahhh. Wonder what's going on with numcodecs? They're usually pretty good for this.