deepvariant
deepvariant copied to clipboard
ouput variants from tool
I have run the following command for RNA seq data and the output vcf size is very less and important variants are missing BIN_VERSION="1.5.0"
docker run \
-v "$(pwd):$(pwd)" \
-w $(pwd) \
google/deepvariant:"${BIN_VERSION}" \
run_deepvariant \
--model_type=WES \
--customized_model=model/model.ckpt \
--ref=reference/GRCh38_no_alt_analysis_set.fasta \
--reads=test_data/Aligned.sortedByCoord.out.bam \
--output_vcf=output/output.vcf.gz \
--num_shards=30 \
--make_examples_extra_args="split_skip_reads=true,channels=''" \
--logging_dir=output/logs \
--intermediate_results_dir output/intermediate_results_dir
Please let me know if any error in the command i ran
Hi @NIBIL401
Could I request a bit more information. When you say there are fewer variants, what are you comparing this to? I do note that you have BIN_VERSION=1.5.0, but our case study for RNAseq is BIN_VERSION=1.4.0. You may get better results using BIN_VERSION=1.4.0
Hi @AndrewCarroll When i ran variant calling on the same bam with other tool Im getting more variants than while running deepvariant. Also, some of the important variants are missed in the final output in the deepvariant. I tried 1.4.0 and i'm getting the same output. Let me know if there is any way to optimize the parameter or the code I'm trying is correct.
Hi @NIBIL401
I don't see any other specific issues in your command. Without knowing more about the specific types of differences, it's difficult to give advice on what might be missing. One observation that we do have is that DeepVariant has learned not to call RNA editing events as variants. These are post-transcription changes to the RNA sequence. Those edits appear as A->G and T->C in sequencing data. To give more advice beyond this, I think I would need to know more about the sequencing (the most ideal would be to have some a BAM file or snippet with a variant call not being made that we can diagnose why).
Thank you, Andrew
Hi @AndrewCarroll , I used gatk to call variants from the RNA seq bam and I got around 13397 variants along with splice variants. But when I tried using the deep variant I only got 215 variants with important splice variants missing. Also i would like to know which type to bam is best for the use of deepvariant. i,e with chimeric read or without chimeric read option. Sorry i could not give you more information
Hi @NIBIL401
I'm sorry, but without taking a look at the BAM file and the variants called or not called, it's quite difficult to say the reason why a variant would be missing. If you are able to share a snippet of it with an example, we can take a look.
For chimeric reads, do you mean secondary/supplementary read alignments?
Hi @NIBIL401 , as @AndrewCarroll mentioned, it's hard for us to help determine the reason if we can't have a reproducible setup. If you can provide a similar reproducible setup with public data, that will be great!
Meanwhile, please read https://github.com/google/deepvariant/blob/r1.6/docs/FAQ.md#why-does-deepvariant-not-call-a-specific-variant-in-my-data to see if any of the topics there might apply.
For now, I'll close this issue, but please do feel free to reopen this bug with more information to help us debug!