piotrszul
piotrszul
@BauerLab as far as I can tell they are already included in the example notebook see: https://bitbucket.csiro.au/users/hos076/repos/variantspark-aws/browse/data/monitor-ami/notebook/VariantSpark_example.ipynb (or : https://variantspark-marketplace-resources.s3.amazonaws.com/static/public/example_notebook.html) ``` covariates = [mt.pheno.isFemale, mt.pcs[0], mt.pcs[1]] result = hl.logistic_regression_rows(test ='wald',...
HI @ArashBayatDev I have added a new attribute `classCounts` to the JSON tree nodes, which is an array with the count of samples from each of the classes. Also I...
HI @vishaln79 , the problem is caused by the version of libstd++ packaged with Centos 7. Apparently ships with libstdc++-4.8.5 which supports CXXABI_1.3.7 while VariantSpark (an in particula Hail library)...
I have implemented a basic version of AIR (with the `-ic` option). In addition `-icsr` option can be used to set the random seed for label permutation so that importance...
HI Amnon, good to hear from you and thanks for you comments :) - the main feature of AIR is that all the variables are permutated in the same way...
@amnonbleich I am very curious if you have any thought on how well AIR deals with correlation. As I understand it removes bias due to different types of variables or...
I think the flag (bi-allelic variants) was the results of my evolving (mis)understanding of how variants are represented in VCF files and more precisely, what constitutes a unique key, that...
Here is in interesting info on randomness of various hashing algorithms: https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
This can be observed for example on the sparse synthetic datasets e.g. `src/test/data/synth/synth_2000_500_fact_10_0.995-wide.csv` The reason seem to the that the very sparse data result in very deep and unbalanced trees...
Here is a good resource on how to do it with maven: https://www.baeldung.com/maven-integration-test