HTStream
HTStream copied to clipboard
Remaining adapter sequence
Describe the bug hts_AdapterTrimmer and hts_PolyATTrim did not remove adapters in some reads that have incomplete, degeneracy or extra sequence in the adapter sequence of the 3' end of the reads
To Reproduce Example sequence reads that will help reproduce the bug: @A01488:145:HHFCKDSX5:4:1101:19895:1000_TATA_NTGGCT 1:N:0:TAACCAGCACTT+NATGTCGTTGGA GAGTTTGTGATTTAAACATTTTGTTGTTAATAATATTGATATTGTATTTTCTTGAATGTGGAACTTTCTTTTTTATGCTTACGTACCAAAAAAAAAAAAAAAAAAAACGGAATAGCAAACGTCTTAAAACCAGTCAAAAA
@A01488:145:HHFCKDSX5:4:1101:25247:1000_TATA_NTGGGG 1:N:0:TAACCAGCACTT+NATGTCGTTGGA TTGAGATGGGTGTTCCAAGAGTCGAATAGCTTGGGAATGCTGTTCTAAATGGGTGGTAAATTTCATCTAAAGCTAAATATCGACGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAAAAAAAAAAAAAAAAAGATCG
@A01488:145:HHFCKDSX5:4:1101:7139:1016_TATA_TGAGTT 1:N:0:TAACCAGCACTT+AATGTCGTTGGA TTGCTTTCATCATCCCTTTTACAGGGTGAAATTAATTGTTACTTTCAACAGATGCTTCTGATTAAAAAAAAAAAAAAAAAAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAACCAGCACTTATCTCGTTTGCCGGG
Commands to reproduce the behavior: module load miniconda source activate process_reads
for x in ${umi_folder[@]}; do y=${x##/} echo Working on $y hts_Stats -F -L ${log_folder}${y}.log -U ${x} | #read stats hts_SeqScreener -AL ${log_folder}${y}.log | #remove contaminants hts_AdapterTrimmer -AL ${log_folder}${y}.log | #trim adaptors hts_PolyATTrim -AL ${log_folder}${y}.log | hts_QWindowTrim -AL ${log_folder}${y}.log | #remove low-quality hts_NTrimmer -AL ${log_folder}${y}.log | #remove Ns hts_Stats -AL ${log_folder}${y}.log -f ${clean_folder}${y} #read stats and save cleaned done
conda deactivate module unload miniconda
Expected behavior Since the Lexogen 3' end which consists of 5' – A{18}AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC – 3' (adepter sequence same as default TruSeq), we expect that the sequence of PolyA tail and adapetr sequence will be trimmed.
Screenshots
Desktop (please complete the following information):
- OS: macOS Catalina 10.15.7 but terminal is connected to HPC
Additional context Add any other context about the problem here.
