Christopher Chang

Results 51 comments of Christopher Chang

It would be great if the seekable format plays well with the tabix index format (https://samtools.github.io/hts-specs/tabix.pdf ) widely used in bioinformatics; this would speed up a bunch of workflows by...

Okay, after taking a look at the current seekable API, it appears to be adequate for a basic single-threaded tabix implementation, but it's missing parallel-compression and decompression-stream support which would...

plink2 (https://github.com/chrchang/plink-ng ; will post precompiled binaries to https://www.cog-genomics.org/plink/2.0/ after I test for a few more hours) now supports this for diploid data. Sample usage: plink2 --vcf [vcf filename, could...

plink's way of controlling this is the [--output-chr flag](https://www.cog-genomics.org/plink/2.0/data#irreg_output).

That creates more interoperability problems than it solves. It is reasonable for programs taking plink-formatted input to currently assume no 'chr' prefix in .bim/.pvar files. I will look into adding...

The C++ pgenlib code does not contain any multithreading of its own (unless you count the isolated .pvar loader); it only includes some low-level constructs which are practical for plink2...

I may implement this at some point, but this is relatively low-priority since (i) bcftools already handles this and (ii) if bcftools is really too slow, splitting this type of...

This is probably fixed by 1253c08 .

actually, never mind, this was linear rather than logistic regression.

As with the plink .bed format, haploid vs. diploid is not directly encoded in the .pgen. Instead, plink and plink2 divide the encoded values by two when the .bim/.pvar (and...