selene icon indicating copy to clipboard operation
selene copied to clipboard

Add H5 GenomicFeatures support for more flexible target datatypes

Open kathyxchen opened this issue 1 year ago • 0 comments

This PR adds support for a new target type. Previously Selene only supports sampling of binary targets, but GenomicFeaturesH5 is a class supporting the querying of genomic coordinate rows in a tabix-indexed BED file and an HDF5 matrix of corresponding labels for training. Changes in other files are used to support this new functionality, for example compression/decompression of sequence one-hot encoding only as opposed to both sequences & targets which was the default implementation, support of Spearman's and Pearson's correlation computation in PerformanceMetrics, and the like.

kathyxchen avatar Jul 15 '24 15:07 kathyxchen