varsim icon indicating copy to clipboard operation
varsim copied to clipboard

Weird coverage profile for spike in fastqs

Open jayaramanp opened this issue 5 years ago • 2 comments

Hello @yunfeiguo I was able to successfully run varsim_multi.py and generate fastq with variants "spiked in" from dbsnp snps.

However, the coverage of the samples looks very funky! see how the coverage right over the exon almost resembles 2 prongs instead of one uniform coverage? Can we implement a parameter for sorts to fix this issue?

Screen Shot 2019-04-29 at 1 34 26 PM Screen Shot 2019-04-29 at 1 32 14 PM

in this case the top is the original sequence and bottom is the spiked-in sequence..

Screen Shot 2019-04-29 at 2 40 38 PM

jayaramanp avatar Apr 29 '19 18:04 jayaramanp

Hi @jayaramanp several comments and questions:

  • could you load the BED file used for simulation at the bottom of the igv screenshot?
  • one strategy is flank the regions (by 500bp, say) and then use flanked regions for simulation to achieve better uniformity around original regions.
  • another solution is increase read length (-l option for --simulator_options) because apparently the green reads (I assume they are real data) are slightly longer than 100bp you used in #196

yunfeiguo avatar Apr 30 '19 17:04 yunfeiguo

@yunfeiguo

  1. could you load the BED file used for simulation at the bottom of the IGV screenshot?

I essentially used a CDS region of a gene as a target.

1 | 100316583 | 100316695 | NM_000028.2_cds_1 1 | 100316583 | 100316695 | NM_000642.2_cds_1 1 | 100316583 | 100316695 | NM_000643.2_cds_1 1 | 100316583 | 100316695 | NM_000644.2_cds_1 1 | 100318210 | 100318274 | NM_000646.2_cds_1 1 | 100327043 | 100327284 | NM_000028.2_cds_2 1 | 100327043 | 100327284 | NM_000642.2_cds_2 1 | 100327043 | 100327284 | NM_000643.2_cds_2 1 | 100327043 | 100327284 | NM_000644.2_cds_2 1 | 100327043 | 100327284 | NM_000646.2_cds_2 1 | 100327797 | 100327994 | NM_000028.2_cds_3 1 | 100327797 | 100327994 | NM_000642.2_cds_3 1 | 100327797 | 100327994 | NM_000643.2_cds_3 1 | 100327797 | 100327994 | NM_000644.2_cds_3 1 | 100327797 | 100327994 | NM_000646.2_cds_3 1 | 100329926 | 100330160 | NM_000028.2_cds_4 1 | 100329926 | 100330160 | NM_000642.2_cds_4 1 | 100329926 | 100330160 | NM_000643.2_cds_4 1 | 100329926 | 100330160 | NM_000644.2_cds_4 1 | 100329926 | 100330160 | NM_000646.2_cds_4 1 | 100335940 | 100336152 | NM_000028.2_cds_5 1 | 100335940 | 100336152 | NM_000642.2_cds_5 1 | 100335940 | 100336152 | NM_000643.2_cds_5 1 | 100335940 | 100336152 | NM_000644.2_cds_5 1 | 100335940 | 100336152 | NM_000646.2_cds_5 1 | 100336298 | 100336440 | NM_000028.2_cds_6 1 | 100336298 | 100336440 | NM_000642.2_cds_6 1 | 100336298 | 100336440 | NM_000643.2_cds_6 1 | 100336298 | 100336440 | NM_000644.2_cds_6 1 | 100336298 | 100336440 | NM_000646.2_cds_6 1 | 100340227 | 100340381 | NM_000028.2_cds_7 1 | 100340227 | 100340381 | NM_000642.2_cds_7 1 | 100340227 | 100340381 | NM_000643.2_cds_7 1 | 100340227 | 100340381 | NM_000644.2_cds_7 1 | 100340227 | 100340381 | NM_000646.2_cds_7 1 | 100340694 | 100340827 | NM_000028.2_cds_8 1 | 100340694 | 100340827 | NM_000642.2_cds_8 1 | 100340694 | 100340827 | NM_000643.2_cds_8 1 | 100340694 | 100340827 | NM_000644.2_cds_8 1 | 100340694 | 100340827 | NM_000646.2_cds_8 1 | 100340898 | 100341026 | NM_000028.2_cds_9 1 | 100340898 | 100341026 | NM_000642.2_cds_9 1 | 100340898 | 100341026 | NM_000643.2_cds_9 1 | 100340898 | 100341026 | NM_000644.2_cds_9 1 | 100340898 | 100341026 | NM_000646.2_cds_9 1 | 100341998 | 100342168 | NM_000028.2_cds_10 1 | 100341998 | 100342168 | NM_000642.2_cds_10 1 | 100341998 | 100342168 | NM_000643.2_cds_10 1 | 100341998 | 100342168 | NM_000644.2_cds_10 1 | 100341998 | 100342168 | NM_000646.2_cds_10 1 | 100343181 | 100343399 | NM_000028.2_cds_11 1 | 100343181 | 100343399 | NM_000642.2_cds_11 1 | 100343181 | 100343399 | NM_000643.2_cds_11 1 | 100343181 | 100343399 | NM_000644.2_cds_11 1 | 100343181 | 100343399 | NM_000646.2_cds_11 1 | 100345463 | 100345617 | NM_000028.2_cds_12 1 | 100345463 | 100345617 | NM_000642.2_cds_12 1 | 100345463 | 100345617 | NM_000643.2_cds_12 1 | 100345463 | 100345617 | NM_000644.2_cds_12 1 | 100345463 | 100345617 | NM_000646.2_cds_12 1 | 100346172 | 100346366 | NM_000028.2_cds_13 1 | 100346172 | 100346366 | NM_000642.2_cds_13 1 | 100346172 | 100346366 | NM_000643.2_cds_13 1 | 100346172 | 100346366 | NM_000644.2_cds_13 1 | 100346172 | 100346366 | NM_000646.2_cds_13 1 | 100346616 | 100346748 | NM_000028.2_cds_14 1 | 100346616 | 100346748 | NM_000642.2_cds_14 1 | 100346616 | 100346748 | NM_000643.2_cds_14 1 | 100346616 | 100346748 | NM_000644.2_cds_14 1 | 100346616 | 100346748 | NM_000646.2_cds_14 1 | 100346832 | 100347018 | NM_000028.2_cds_15 1 | 100346832 | 100347018 | NM_000642.2_cds_15 1 | 100346832 | 100347018 | NM_000643.2_cds_15 1 | 100346832 | 100347018 | NM_000644.2_cds_15 1 | 100346832 | 100347018 | NM_000646.2_cds_15 1 | 100347081 | 100347262 | NM_000028.2_cds_16 1 | 100347081 | 100347262 | NM_000642.2_cds_16 1 | 100347081 | 100347262 | NM_000643.2_cds_16 1 | 100347081 | 100347262 | NM_000644.2_cds_16 1 | 100347081 | 100347262 | NM_000646.2_cds_16 1 | 100349660 | 100349815 | NM_000028.2_cds_17 1 | 100349660 | 100349815 | NM_000642.2_cds_17 1 | 100349660 | 100349815 | NM_000643.2_cds_17 1 | 100349660 | 100349815 | NM_000644.2_cds_17 1 | 100349660 | 100349815 | NM_000646.2_cds_17 1 | 100349879 | 100350022 | NM_000028.2_cds_18 1 | 100349879 | 100350022 | NM_000642.2_cds_18 1 | 100349879 | 100350022 | NM_000643.2_cds_18 1 | 100349879 | 100350022 | NM_000644.2_cds_18 1 | 100349879 | 100350022 | NM_000646.2_cds_18 1 | 100350109 | 100350274 | NM_000028.2_cds_19 1 | 100350109 | 100350274 | NM_000642.2_cds_19 1 | 100350109 | 100350274 | NM_000643.2_cds_19 1 | 100350109 | 100350274 | NM_000644.2_cds_19 1 | 100350109 | 100350274 | NM_000646.2_cds_19 1 | 100353518 | 100353679 | NM_000028.2_cds_20 1 | 100353518 | 100353679 | NM_000642.2_cds_20 1 | 100353518 | 100353679 | NM_000643.2_cds_20 1 | 100353518 | 100353679 | NM_000644.2_cds_20 1 | 100353518 | 100353679 | NM_000646.2_cds_20 1 | 100356760 | 100356927 | NM_000028.2_cds_21 1 | 100356760 | 100356927 | NM_000642.2_cds_21 1 | 100356760 | 100356927 | NM_000643.2_cds_21 1 | 100356760 | 100356927 | NM_000644.2_cds_21 1 | 100356760 | 100356927 | NM_000646.2_cds_21 1 | 100357146 | 100357310 | NM_000028.2_cds_22 1 | 100357146 | 100357310 | NM_000642.2_cds_22 1 | 100357146 | 100357310 | NM_000643.2_cds_22 1 | 100357146 | 100357310 | NM_000644.2_cds_22 1 | 100357146 | 100357310 | NM_000646.2_cds_22 1 | 100357972 | 100358178 | NM_000028.2_cds_23 1 | 100357972 | 100358178 | NM_000642.2_cds_23 1 | 100357972 | 100358178 | NM_000643.2_cds_23 1 | 100357972 | 100358178 | NM_000644.2_cds_23 1 | 100357972 | 100358178 | NM_000646.2_cds_23 1 | 100361826 | 100361959 | NM_000028.2_cds_24 1 | 100361826 | 100361959 | NM_000642.2_cds_24 1 | 100361826 | 100361959 | NM_000643.2_cds_24 1 | 100361826 | 100361959 | NM_000644.2_cds_24 1 | 100361826 | 100361959 | NM_000646.2_cds_24 1 | 100366176 | 100366432 | NM_000028.2_cds_25 1 | 100366176 | 100366432 | NM_000642.2_cds_25 1 | 100366176 | 100366432 | NM_000643.2_cds_25 1 | 100366176 | 100366432 | NM_000644.2_cds_25 1 | 100366176 | 100366432 | NM_000646.2_cds_25 1 | 100368223 | 100368365 | NM_000028.2_cds_26 1 | 100368223 | 100368365 | NM_000642.2_cds_26 1 | 100368223 | 100368365 | NM_000643.2_cds_26 1 | 100368223 | 100368365 | NM_000644.2_cds_26 1 | 100368223 | 100368365 | NM_000646.2_cds_26 1 | 100376252 | 100376418 | NM_000028.2_cds_27 1 | 100376252 | 100376418 | NM_000642.2_cds_27 1 | 100376252 | 100376418 | NM_000643.2_cds_27 1 | 100376252 | 100376418 | NM_000644.2_cds_27 1 | 100376252 | 100376418 | NM_000646.2_cds_27 1 | 100377945 | 100378088 | NM_000028.2_cds_28 1 | 100377945 | 100378088 | NM_000642.2_cds_28 1 | 100377945 | 100378088 | NM_000643.2_cds_28 1 | 100377945 | 100378088 | NM_000644.2_cds_28 1 | 100377945 | 100378088 | NM_000646.2_cds_28 1 | 100379067 | 100379309 | NM_000028.2_cds_29 1 | 100379067 | 100379309 | NM_000642.2_cds_29 1 | 100379067 | 100379309 | NM_000643.2_cds_29 1 | 100379067 | 100379309 | NM_000644.2_cds_29 1 | 100379067 | 100379309 | NM_000646.2_cds_29 1 | 100380929 | 100381057 | NM_000028.2_cds_30 1 | 100380929 | 100381057 | NM_000642.2_cds_30 1 | 100380929 | 100381057 | NM_000643.2_cds_30 1 | 100380929 | 100381057 | NM_000644.2_cds_30 1 | 100380929 | 100381057 | NM_000646.2_cds_30 1 | 100381950 | 100382068 | NM_000028.2_cds_31 1 | 100381950 | 100382068 | NM_000642.2_cds_31 1 | 100381950 | 100382068 | NM_000643.2_cds_31 1 | 100381950 | 100382068 | NM_000644.2_cds_31 1 | 100381950 | 100382068 | NM_000646.2_cds_31 1 | 100382138 | 100382302 | NM_000028.2_cds_32 1 | 100382138 | 100382302 | NM_000642.2_cds_32 1 | 100382138 | 100382302 | NM_000643.2_cds_32 1 | 100382138 | 100382302 | NM_000644.2_cds_32 1 | 100382138 | 100382302 | NM_000646.2_cds_32 1 | 100387074 | 100387222 | NM_000028.2_cds_33 1 | 100387074 | 100387222 | NM_000642.2_cds_33 1 | 100387074 | 100387222 | NM_000643.2_cds_33 1 | 100387074 | 100387222 | NM_000644.2_cds_33 1 | 100387074 | 100387222 | NM_000646.2_cds_33

  1. one strategy is flank the regions (by 500bp, say) and then use flanked regions for simulation to achieve better uniformity around original regions.

I can do that.

  1. another solution is increase read length (-l option for --simulator_options) because apparently the green reads (I assume they are real data) are slightly longer than 100bp you used in #196

sounds good. will do that. yes the green reads are real data.

jayaramanp avatar Apr 30 '19 18:04 jayaramanp