Execution of Teloclip and input output streams
Dear Adam,
I hope you are doing well. I just got to know about this tool to find telomers in assembled genomes. I Am trying to run this on one of my assembled genomes but the documentation in readme file is confusing me. I also checked a previously opened issue #20 by @cyycyj but stat also increased some confusions so I will try to put all confusions in one go so yu can help. As from the Documentation I see 3 major steps to follow
Step1: First index the reference assembly
samtools faidx ref.fa
Step2: Streaming SAM records from aligner
minimap2 -ax map-ont ref.fa nanopore.fq | samtools view -h -F 0x100 | teloclip --ref ref.fa.fai | samtools sort > teloclip_overhangs.bam
Step by step as mentioned by @cyycyj in #20
1. minimap2 -ax map-ont ref.fa pacbio.fq.gz > mapped.sam
2. samtools view -h -F 0x100 mapped.sam > FILTERED.SAM
3. teloclip --refIdx ref.fa.fai filtered.sam > teloclip_filtered.sam
5. samtools sort teloclip_filtered.sam -o TELOCLIP_FILTERED.BAM
Step 3: Report clipped alignments containing target motifs
samtools view -h in.bam | teloclip --ref ref.fa.fai --motifs TTAGGG | samtools sort > teloclip_TTAGGG_overhangs.bam
Step by step as mentioned by @cyycyj in #20
1. samtools view -h in.bam > header.sam #convert bam to sam including header information
2. teloclip --ref ref.fa.fai --motifs TTAGGG header.sam > teloclip_TTAGGG_overhangs.sam
3. samtools sort teloclip_TTAGGG_overhangs.sam > teloclip_TTAGGG_overhangs.bam
QUESTION: This in.bam is causig confusion. as @cyycyj had asked already in #20
- This
in.bamwill be the output from step 2 being theFILTERED.SAM (after sam2bam conversion)file ? - This
in.bamwill be the final output from step 2 being theTELOCLIP_FILTERED.BAMfile ?
QUESTION regarding Matching noisy target motifs*
when using the --fuzzy option with teloclip, i get error teloclip: error: unrecognized arguments: --fuzzy
Step 4: Extract clipped reads
samtools view -h in.bam | teloclip --ref ref.fa.fai --motifs TTAGGG | teloclip-extract --refIdx ref.fa.fai --extractReads --extractDir SplitOverhangs
QUESTION: here the in.bam will be which of the following ?
- This
in.bamwill be the output from step 2 being theFILTERED.SAM (after sam2bam conversion)file ? - This
in.bamwill be the final output from step 2 being theTELOCLIP_FILTERED.BAMfile ? - This
in.bamwill be the final output from step 3 being theTELOCLIP_TTAGGG_OVERHANGS.BAMfile ?
Major Confusions 1 in #20 you answered as
" the bam or sam that gets passed to teloclip should always be either raw aligments from an alignment tool like minimap2 OR alignments that have been filtered with samtools view to remove low quality alignments."
This is point of confusion, So in both step 2,3,4 the in.bam file should be the either the raw alignment file mapped.sam or filteres alignment file FILTERED.SAM which are generated in step2. can you please shed some light on this.
Major Confusions 2
all these steps should be executed to run teloclip ??
Step1: samtools faidx ref.fa
Step2: minimap2 -ax map-ont ref.fa nanopore.fq | samtools view -h -F 0x100 | teloclip --ref ref.fa.fai | samtools sort > teloclip_overhangs.bam
Step3: samtools view -h in.bam | teloclip --ref ref.fa.fai --motifs TTAGGG | samtools sort > teloclip_TTAGGG_overhangs.bam
Step4: samtools view -h in.bam | teloclip --ref ref.fa.fai --motifs TTAGGG | teloclip-extract --refIdx ref.fa.fai --extractReads --extractDir SplitOverhangs
OR step 2 3 4 are just different ways to run teloclip and can be combined into one step as below ?
Step1: samtools faidx ref.fa
Step2: minimap2 -ax map-ont ref.fa nanopore.fq | samtools view -h -F 0x100 | teloclip --ref ref.fa.fai --motifs TTAGGG | teloclip-extract --refIdx ref.fa.fai --extractReads --extractDir SplitOverhangs
If you are able to help with the mwntioned questions it will be great help. Best regards
In response to your last question, yes these are all just different ways to chain the steps together.
In your example:
# Index the draft genome assembly
samtools faidx ref.fa
# Map ONT reads to reference with minimap2
# Filter out secondary alignments with samtools
# Find reads that overhang contig ends AND contain TTAGGG motif with teloclip
# Write the overhang reads to fasta files with teloclip-extract
minimap2 -ax map-ont ref.fa nanopore.fq | samtools view -h -F 0x100 | teloclip --ref ref.fa.fai --motifs TTAGGG | teloclip-extract --refIdx ref.fa.fai --extractReads --extractDir SplitOverhangs
This will create a dir called SplitOverhangs that has one fasta file of overhang reads for each of your contigs/chromosomes. The overhanging section of each read will be in lower case. You can manually select the best alignment (long high identity anchor on the contig end + longest run of telomeric repeats in the overhang) for each contig and append the overhang.
Chaining all the steps together like this means that you do not write any of the intermediate files to disk.
However, I generally recommend that you run the initial teloclip step with and without the --motifs filter and manually inspect the overhanging alignments in IGV.
You will see that there are different kinds of overhangs depending on how complete your assembly is.
- Some overhangs may contain unassembled rDNA repeats that are near the end of a chromosome but before the telomeres.
- Fragmented assemblies will have lots of non-telomeric overhangs where the contig has been broken at a repeat.
- Mitochondrial contigs will always have LOTS of overhangs because they are circular and high copy. You should identify these and NOT extend them with overhanging reads.
- Sometimes you will get chance single matches to the telomeric repeat, you should manually inspect the overhangs and decide if it looks real.
- Sometimes it is not possible to identify telomeric repeats in very noisy ONT reads, in which case you should run teloclip without the
--motifsoption and manually select the overhangs that contain telomeres.
Re the --fuzzy option this is still in development and will replace the --noPoly option. Ignore it for now.
Please use the stable release that is on PyPi via pip install teloclip
Dear Adam,
Thank you for your detailed responce. It clerifies alot of things.