TideHunter icon indicating copy to clipboard operation
TideHunter copied to clipboard

Minimum Copy Number Issue

Open kschimke opened this issue 7 months ago • 3 comments

I have data that the insert is long enough that it isn't guaranteed to repeat twice within a read but could have partial repeats. I used a previous version of TideHunter that allowed me to set the minimum copy number (-c) to 1 without crashing even though the default is 2 and produced consensus data for the long insert. This current version errors when I set -c to 1 ([abpoa_gen_cons] No enough sequences to perform msa.) and fails to produce consensus reads for my longer insert with the default of 2. I was wondering if there was a work around to generating consensus reads for this data with the current version.

kschimke avatar Nov 16 '23 20:11 kschimke

It sounds weird to me. Can you paste/upload your full command and part of your data here so that I can look into it?

yangao07 avatar Nov 16 '23 20:11 yangao07

The plasmid in question has a length of around 15 kb and most of the consensus reads when using -c 2 are around 5 kb. TideHunter-v1.5.4/bin/TideHunter -l -m 4000 -c 1 -t 16 AddgenePlasmid.fastq > TideHunter_Consensus.fasta https://drive.google.com/file/d/1gY4vq-YdFNXuOG5Dkef1UBqF04xaDYI2/view?usp=sharing

kschimke avatar Nov 16 '23 20:11 kschimke

Hi, you mentioned that "TideHunter allowed me to set the minimum copy number (-c) to 1 without crashing", however TideHunter was not meant to output any consensus sequence with less than 2 copies (which can not be called as tandem repeat) by default. The one-copy consensus sequence is only allowed when 5'/3' adapter sequences are provided for specific sequencing libraries. For your case, did you get any one-copy consensus using the previous version of TideHunter?

I am going to set a hard threshold for the minimum copy number: >= 2.

yangao07 avatar Nov 17 '23 16:11 yangao07