FLAMES
FLAMES copied to clipboard
run match_cell_barcode, no error, no result, match_cell_barcode /data_RAGE_seq/data1 cell_barcode_stat.txt split_barcode.fastq flame_3M-february-2018.txt 2; split_barcode.fastq is zero,no other file generation。
it is hard to see with limited information. Is there any output in terminal? usually you would see some stats printed after you run the program, here is an example
the first few lines of output:
set UMI length to 10.
First 5 cell barcode:
AAACCTGCAATCCAAC
AAACGGGCATACGCCG
AAACGGGCATTAGGCT
AAACGGGGTATAGTAG
AAAGATGCAACACCCG
/stornext/Genomics/data/CLL_venetoclax/FLTseq/HD11/fastq/HD11_pass.fq.gz
forward flanking end: 66 2819
forward flanking end: 67 2486
the last lines:
24 1117
32 487
###total read: 56654147
###barcode hm match: 33287709
###barcode match: 3337587
###barcode not match: 20009062
###too short: 19789
Hi, Luyi, Would you please make an example of the usage of match_cell_barcode and explain the input files in more details? Is the fastq folder consisting of illumina sequencing data or third generation sequencing data?
here is my error message:
Thanks! Yuchen
Hi, Luyi, Would you please make an example of the usage of match_cell_barcode and explain the input files in more details? Is the fastq folder consisting of illumina sequencing data or third generation sequencing data?
here is my error message:
Thanks! Yuchen
I have the same question
Hi @icanccwhite and @yuchen345 you should use long-read fastq data as input. The cell barcode file come from the short-read data output. from your screenshot it seems you have printed the first 5 cell barcode so the program is running well. Can you check your data path again? I think you need to use absolute path.
Thanks for your reply! @LuyiTian
Here is another error using sc_long_pipeline.py :
### read gene annotation 2022-04-20 20:57:58
remove similar transcripts in gene annotation: Counter({'duplicated_transcripts': 370}) ### find isoforms 2022-04-20 20:59:27 GL000219.1 KI270713.1 KI270733.1 GL000194.1 GL000195.1 KI270731.1 20 Traceback (most recent call last): File "./sc_long_pipeline.py", line 213, in
sc_long_pipeline(args) File "./sc_long_pipeline.py", line 179, in sc_long_pipeline raw_gff3=raw_splice_isoform if config_dict["global_parameters"]["generate_raw_isoform"] else None) File "/home/chenz/biosoft/FLAMES/python/sc_longread.py", line 975, in group_bam2isoform it_region = bamfile.fetch(ch, bl.s, bl.e) File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region ValueError: invalid contig 20
Waiting for your reply!
it seems chromosome 20 is not in the pysam dictionary. I would suggest double check your genome annotation and make sure you download the fasta and gff/gtf file from the same source. did you do anything to the genome annotation? usually chromosome 20 wont be in the end of the chromosome list. from your output it seems to be at the end.
Thank you very much ! @LuyiTian
More questions i am wondering:
-
As you said, the FLAMES searches for both directions and trims adapter sequence + cellbarcode/UMI at both directions, what dose FLAMES do for UMI assignment while a read was tagged with UMI and perhaps there is a sequencing error?
-
I noticed that there is a find_polyT function in match_cell_barcode, have you omitted polyT sequence in the output fastq.gz file? How do you deal with the polyA sequence at the reverse strand?
-
Can the FLAMES be used with 5' libraries(10X ) as there is TSO sequence rather than polyT after cellbarcode/UMI?
Looking forward to your reply.
Thanks