Entropy on strand
Hey @ArtRand,
I just wanted to write to ask for a feature request for entropy which would be to make it "strand aware" by allowing us to specify the strand for each region -, + or ..
It would also be convenient if the documentation specified the headers for the output files when --regions is specified.
Thanks!
Hello @Ge0rges,
I agree that the BED file should direct the strand to use. I'll be sure to add it along with the multi-base work.
Also I've noticed sometimes the output regions.bed is empty with no error printed?
The final log will report 0 regions processed successfully in this case. There is always a bit of a balance to strike with respect to informing the user why something was ineligible to calculate a result and making the logs very verbose and hard to follow. Perhaps a better solution is to tabulate how many regions failed and their reasons?
Hey @ArtRand could you let me know what schema for the output is when --regions is specified?
Hello @Ge0rges,
The schema is:
| col | Name | Description | type |
|---|---|---|---|
| 1 | chrom | chromosome of the region | str |
| 2 | start | 0-based start position of the region | int |
| 3 | end | 0-based end position of the region | int |
| 4 | region_name | name of the region from the input BED file | str |
| 5 | mean_entropy | average entropy of the passing windows included in the region | float |
| 6 | strand | strand of the region {+, -, . } |
str |
| 7 | median_entropy | median entropy of the passing windows included in the region | float |
| 8 | min_entropy | minimum passing window entropy | float |
| 9 | max_entropy | maximum passing window entropy | float |
| 10 | mean_num_reads | average number of reads used in the passing windows' entropy calculation | float |
| 11 | min_num_reads | minimum number of reads used in the passing windows' entropy calculation | int |
| 12 | max_num_reads | minimum number of reads used in the passing windows' entropy calculation | int |
| 13 | successful_window_count | number of passing windows in the region | int |
| 14 | failed_window_count | number of failed windows in the region | int |
You can also pass the --header flag to get a header line in the output.
I'll add this to the documentation.