velocyto.py icon indicating copy to clipboard operation
velocyto.py copied to clipboard

Bam file does not contain cell and umi barcodes appropriatelly formatted (Run on modified Smart-seq2 protocol)

Open Suger0917 opened this issue 5 years ago • 9 comments

Dear Velocyto Team, Our lab used a modified Smart-seq2 protocol to allow for multiplexed single-cell RNA-seq. Primers of RT reaction were designed with cell-specific barcodes and unique molecular identifier [UMI]. After alignment to hg19 reference genome using Tophat, bam files were generated and organized by cell in a folder structure similar to the following: plateX/cell01/cell01.bam plateX/cell02/cell02.bam plateX/cell03/cell03.bam ... Bam file of each cell included UMI information similar to the following: image The first 8bp of column1 is UMI information of each reads. I don't know where and when to add a TAG named UB(UMI barcode) or XM. If I just run velocyto like this: velocyto run -o $output_path -m $repeat_msk_gtf $sorted_genome_bam $gtf It will return erros: OSError: The bam file does not contain cell and umi barcodes appropriatelly formatted. If you are runnin UMI-less data you should use the -U flag. Could you please give me some advice to modify our pipeline to prepare correct format of bam files, or give me some examples including bam files to run velocyto?

Thank you!

Suger0917 avatar Sep 06 '18 13:09 Suger0917

Hi, I came to the same problem. And after I modified my bam file by package "simplesam". This problem has been solved. A template script is as foolowings: import simplesam barcode_tag = 'CB' umi_tag = 'UB' with simplesam.Reader(open("in.bam")) as in_bam: with simplesam.Writer(open("out.sam", 'w'), in_bam.header) as out_sam: for read in in_bam: read[umi_tag] = read.qname.split("")[2] # add the umi tag read[barcode_tag] = read.qname.split("")[1] # add the barcode tag out_sam.write(read)

And then convert this sam file into a bam file by "samtools". Use this bam file as input for velocyto. Hope it will also work for you.

snower2010 avatar Sep 09 '18 09:09 snower2010

Thank you! I tried "simplesam", it works well. But it occupied too much memory of cpu.

Suger0917 avatar Sep 12 '18 08:09 Suger0917

hi

I had the same issue and I tried simplesam but it gives me index error. Do you have any idea why it might happen?

The way I run it:

import simplesam barcode_tag = 'CB' umi_tag = 'UB' with simplesam.Reader(open("A2S_Day6_sorted.bam")) as in_bam: ... with simplesam.Writer(open("out.sam", 'w'), in_bam.header) as out_sam: ... for read in in_bam: ... read[umi_tag] = read.qname.split()[2] ... read[barcode_tag] = read.qname.split()[1] ... out_sam.write(read)

error:

File "", line 4, in IndexError: list index out of range

I appreciate your help

eynullazada avatar Sep 21 '20 03:09 eynullazada

Hello, I have encountered the same issue using a BAM file output from BD-Rhapsody. I would appreciate your help

MelissaSaichi avatar Nov 19 '20 09:11 MelissaSaichi

Hello, I have encountered the same issue using a BAM file output from BD-Rhapsody. I would appreciate your help

Did you fix the issue, I get the same error. Many thanks~

yaxing0zhao avatar Aug 11 '21 09:08 yaxing0zhao

@Suger0917 did you have to modify the script provided by @snower2010 somehow for your own data, or you just use the same? And if you had to, how did you go about doing it? Thank you so much!

denvercal1234GitHub avatar Nov 17 '21 12:11 denvercal1234GitHub

I am also getting the same error as @MelissaSaichi and @yaxing0zhao with a BD-Rhapsody file. Was this issue ever resolved, and if so, how?

Akriebs avatar Mar 30 '22 16:03 Akriebs

Hi all

I was having the same issue and running the following command helped:

samtools view my_data_sorted.bam -h |awk '{gsub(/XU:/,"XM:"); print $0}' |awk '{gsub(/XB:/,"XC:"); print $0}' > my_data_sorted_replacetagcode.sam

Hope it helps

Khagani


From: Akriebs @.> Sent: Wednesday, March 30, 2022 11:00 AM To: velocyto-team/velocyto.py @.> Cc: KHAGANI EYNULLAZADA @.>; Comment @.> Subject: Re: [velocyto-team/velocyto.py] Bam file does not contain cell and umi barcodes appropriatelly formatted (Run on modified Smart-seq2 protocol) (#107)

I am also getting the same error as @MelissaSaichihttps://github.com/MelissaSaichi and @yaxing0zhaohttps://github.com/yaxing0zhao with a BD-Rhapsody file. Was this issue ever resolved, and if so, how?

— Reply to this email directly, view it on GitHubhttps://github.com/velocyto-team/velocyto.py/issues/107#issuecomment-1083326230, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANGUEU6ESHZYI763JRVRHRLVCR3BTANCNFSM4FTTSC3Q. You are receiving this because you commented.Message ID: @.***>

eynullazada avatar Mar 30 '22 17:03 eynullazada

Hi, I came to the same problem. And after I modified my bam file by package "simplesam". This problem has been solved. A template script is as foolowings: import simplesam barcode_tag = 'CB' umi_tag = 'UB' with simplesam.Reader(open("in.bam")) as in_bam: with simplesam.Writer(open("out.sam", 'w'), in_bam.header) as out_sam: for read in in_bam: read[umi_tag] = read.qname.split("")[2] # add the umi tag read[barcode_tag] = read.qname.split("")[1] # add the barcode tag out_sam.write(read)

And then convert this sam file into a bam file by "samtools". Use this bam file as input for velocyto. Hope it will also work for you.

excuse!could you give the .py file? I cannot unserstand the Code Indent

MoonlightFansty avatar Oct 27 '23 10:10 MoonlightFansty