RetroSeq icon indicating copy to clipboard operation
RetroSeq copied to clipboard

Separating .vcf's by individual on pooled call?

Open lokeyCEU opened this issue 7 years ago • 3 comments

I have merged .bam files from the 1kGP (with samtools merge -r) and performed RetroSeq discovery phase on the merged .bam.

But now when I call the merged .bam I get only one .vcf output. How do I create .vcf's for each individual in the merged .bam?

This is similar to what Wildschutte did in a 2015 study. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666360/

Thank you.

EDIT: (May 2017) I was mistaken, the merged (pooled) .bam is used during the calling phase NOT discovery.

lokeyCEU avatar Apr 20 '17 21:04 lokeyCEU

Can you provide the command lines you have run?

On 20/04/17 22:01, lokeyCEU wrote:

I have merged .bam files from the 1kGP (with samtools merge -r) and performed RetroSeq discovery phase on the merged .bam.

But now when I call the merged .bam I get only one .vcf output. How do I create .vcf's for each individual in the merged .bam?

This is similar to what Wildschutte did in a 2015 study. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666360/

Thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wtsi-svi/RetroSeq/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AAf_sWb9Kr-Dmb2idbrGj-FMg0kjWMORks5rx8edgaJpZM4NDkmR.

tk2 avatar Apr 21 '17 08:04 tk2

Absolutely.

I did samtools -r merge for all individuals from the CEU into a single pooled .bam

Then ran discovery phase; perl /path/to/software/Retroseq/RetroSeq-master/bin/retroseq.pl -discover -bam TestCEU-r.bam -output CEU-r.HERVK.tab -eref HERVKfa.tab -refTEs HERVKbed.tab -align

Then call phase; perl /path/to/software/Retroseq/RetroSeq-master/bin/retroseq.pl -call -bam TestCEU-r.bam -input CEU-r.HERVK.tab -ref hg19.refFIX.fa -output HERVK.TEST-r.vcf -reads 2 -depth 10000

But the .vcf that comes out is all the pooled individuals and I want the call separated by individual.

Thanks!

lokeyCEU avatar Apr 21 '17 17:04 lokeyCEU

UPDATE:

The Wildschutte 2016 paper took these, simplified, steps.

  1. -discover phase on individual .bam's from 1kGP, to produce candidates
  2. merge .bam's by population, with samtools merge
  3. -call phase on merged .bam to produce .vcf Problem is that output .vcf gives insertion presence of all individuals in ONE column. If each individuals insertion presence were in separate columns one could simply use bcftools to separate. Is there something I am missing that will produce .vcf's for each individual, or at least columns by individual, from the merged .bam?

Here is the command I used; nohup perl retroseq.pl -call -bam TestCEU-r.bam -input HERVK_*.tab -ref hg19.refFIX.fa -output TestPooledCall.CEU-r.vcf -reads 2 -depth 10000 & NOTE: the -input is a prefix of a series of files all named HERVK_(Insert individuals name here).tab, Is this where things have gone awry?

Thanks!

lokeyCEU avatar Jun 14 '17 21:06 lokeyCEU