Jerome Kelleher comments

Results 753 comments of


                                            Jerome Kelleher

SGkit equivalent for tutorial read_vcf

Let't not go overboard with documenting this yet, the VCF code based on Dask for sgkit has been very problematic and I'm looking into replacing it.

Removing entirely unary nodes

What percentage of nodes/edges would we remove from an inferred ts from real data?

Default SGkit alleles are byte strings, which confuses tsinfer:

I feel like this is an sgkit bug, as in we should have predictable usage of ``str`` types rather than bytes?

Use imputation accuracy to determine the correct mismatch ratios

We want to use several different metrics for imputation accuracy here, to avoid overfitting. @szhan can you outline what the different metrics people use are, please?

Refactor low-level code

No rush - I'm just pushing here to be able to refer to it and for discussion when we're interested.

Add two-bit encoding for generate ancestors

I'm not sure the ancestor generator supports multi allelic anyway, so I guess it's just missing data we need to consider for now

Add tsinfer specific schema for node metadata.

I'm hesitant to use a struct schema to be honest, wouldn't JSON be a a lot simpler? Not sure why we'd set defaults as well as marking them as required?...

Add tsinfer specific schema for node metadata.

Yes, that's true. 27M is quite a few, so the difference would be a few hundred megabytes. All right, I think that's a good justification for going with struct, let's...

Variant method to return actual states

This is useful all right, we use something like this in a bunch of places.

Python 3.11 support

Ahhh. Wonder what's going on with numcodecs? They're usually pretty good for this.