Rachel Colquhoun issues

Results 16 issues of


                                            Rachel Colquhoun

wrong ref allele given in vcf of calls

When running the indep pipeline to create a massive vcf of variants in a groups of samples, get errors from bcf merge step. Here are some examples of pairs of...

run_indep_wkflow_with_gnu_par hangs after parallel build step before combine

So I think the problem is that the list_all_raw_vcfs file ends up empty. Because it exists, no error is thrown by combine_vcfs.pl and instead it hangs. I believe that the...

Cigar format using ssw

I want to do a SW alignment and get both a score, and a cigar as output. I have tried the following two methods using the latest pip installed version...

Viral alignments often have long runs of NNNs in...

Handle by disallowing alleles in non-match intervals which are entirely N. The result/consequence is that we need to allow for fewer sequences to be considered after running kmeans_clustering. Also add...

How does sourmash gather time scale with reads (and can this be reduced with multithreading)?

I'd really like to use sourmash for metagenomic classification as in portik et al. I have been trying it out on small datasets and I've noticed that the gather step...

VCF header needs more information

All VCFs need the following in their header: ``` ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ``` as well as `##contig=` for each $id in the CHROM...

enhancement

Compare command outputs single VCF instead of multiple directories

Currently outputs a pangenome matrix file with a dodgy name (missing /?) Currently outputs a directory for every gene found in any sample - make this one for everything.

enhancement

re-implement the forcing hits to be collinear

In cluster finding/filtering use collinearity to avoid some spurious hits

enhancement

Is `find_mean_covg' still appropriate in estimate parameters?

Currently pandora has two models allowed for kmer coverage distribution: negative binomial and binomial (approx poisson). Default is negative binomial. In https://github.com/rmcolq/pandora/blob/7adf63ce60d28a000f7b6c850f4a5ebbbf2dd031/src/estimate_parameters.cpp#L235, if I find that the mean and variance...

Filtering max path based on coverage

Pandora currently uses crude thresholds based on the estimated global coverage as compared to the mode or mean coverage along the path. This should probably be based on some intuition...