VariantSpark icon indicating copy to clipboard operation
VariantSpark copied to clipboard

machine learning for genomic variants

Results 76 VariantSpark issues
Sort by recently updated
recently updated
newest added

Hello, I am trying to install VariantSpark on a Centos 7 box, jdk 1.8, scala 2.3.1, spark 2.1.1. When I do a mvn clean install, the following test is failing....

This procedure provides a Gini-based variable importance method that corrects bias for different number of categories (minor-allele-frequency bias in GWAS) and also shows some promising results regarding correlation issues. the...

The "Biallelic" option in the current version allows for two different representations of variants in the output file. - CHR_POS - CHR_POS_REF_ALT I was wondering if this option is extended...

I recommend the following improvement to VariantSpark Random Forest importance analysis. 1. Compute and write importance score to a file after building every 1000 tree. 2. Automatically identify when enough...

org.json4s json4s-ext_${scala.binary.version} 3.2.11 This dependancy is there twice. Affecting the maven build.

The procedure of selecting split variables in case of equal reduction in impurity is slightly biased towards variables with larger indexes. In the previous non-reproducible approach it was casused by...

Some ideas to consider for improved performance: * splits coming form a singel variable are likely to be very sparse -> as such it may not make sense to return...

enhancement

This is noticeable by comparing runtime on sparse vs dense synthetic regression datasets. The sparse ones run much slower although intuitively they should run faster.

Make is somehow possible to group tests based on the spark context then need. Currently only one context is possible for all tests, while three different context are needed -...

techdebt

When using VariantSpark Interface for Hail, a large batch size could lead to a crash in the process. For example for the following setup a batch size of 250 result...