modkit icon indicating copy to clipboard operation
modkit copied to clipboard

Entropy on strand

Open Ge0rges opened this issue 1 year ago • 5 comments

Hey @ArtRand,

I just wanted to write to ask for a feature request for entropy which would be to make it "strand aware" by allowing us to specify the strand for each region -, + or ..

It would also be convenient if the documentation specified the headers for the output files when --regions is specified.

Thanks!

Ge0rges avatar Nov 18 '24 20:11 Ge0rges

Hello @Ge0rges,

I agree that the BED file should direct the strand to use. I'll be sure to add it along with the multi-base work.

ArtRand avatar Nov 20 '24 01:11 ArtRand

Also I've noticed sometimes the output regions.bed is empty with no error printed?

Ge0rges avatar Nov 20 '24 07:11 Ge0rges

The final log will report 0 regions processed successfully in this case. There is always a bit of a balance to strike with respect to informing the user why something was ineligible to calculate a result and making the logs very verbose and hard to follow. Perhaps a better solution is to tabulate how many regions failed and their reasons?

ArtRand avatar Nov 20 '24 08:11 ArtRand

Hey @ArtRand could you let me know what schema for the output is when --regions is specified?

Ge0rges avatar Nov 26 '24 19:11 Ge0rges

Hello @Ge0rges,

The schema is:

col Name Description type
1 chrom chromosome of the region str
2 start 0-based start position of the region int
3 end 0-based end position of the region int
4 region_name name of the region from the input BED file str
5 mean_entropy average entropy of the passing windows included in the region float
6 strand strand of the region {+, -, . } str
7 median_entropy median entropy of the passing windows included in the region float
8 min_entropy minimum passing window entropy float
9 max_entropy maximum passing window entropy float
10 mean_num_reads average number of reads used in the passing windows' entropy calculation float
11 min_num_reads minimum number of reads used in the passing windows' entropy calculation int
12 max_num_reads minimum number of reads used in the passing windows' entropy calculation int
13 successful_window_count number of passing windows in the region int
14 failed_window_count number of failed windows in the region int

You can also pass the --header flag to get a header line in the output.

I'll add this to the documentation.

ArtRand avatar Nov 27 '24 16:11 ArtRand