selene
selene copied to clipboard
Add H5 GenomicFeatures support for more flexible target datatypes
This PR adds support for a new target type. Previously Selene only supports sampling of binary targets, but GenomicFeaturesH5 is a class supporting the querying of genomic coordinate rows in a tabix-indexed BED file and an HDF5 matrix of corresponding labels for training. Changes in other files are used to support this new functionality, for example compression/decompression of sequence one-hot encoding only as opposed to both sequences & targets which was the default implementation, support of Spearman's and Pearson's correlation computation in PerformanceMetrics, and the like.