DamageProfiler
DamageProfiler copied to clipboard
Add an option to subsample the input
Currently, there is no way to randomly subsample an input bam within DamageProfiler. When the input bams are very large, this can increase the runtime considerably, while the damage rates estimates are hardly changing compared to using a subset of the reads.
It would be very useful to have an option where a user could specify a number of reads to use for damage calculation, similar to how this functionality is implemented in mapDamage.
Proposed functionality
A user can specify either a number of reads (e.g. 10 000 000), or a fraction of reads (e.g. 0.5). If an integer is given, use up to that number of randomly subsampled reads for damage calculation. If fewer than the requested reads are in the bam file, simply use all available reads. If a float is given, randomly subsample that fraction of reads for the calculation.
Thanks for this input. I'll have a look on it and will consider this option in a future version!