varsim
varsim copied to clipboard
Weird coverage profile for spike in fastqs
Hello @yunfeiguo I was able to successfully run varsim_multi.py and generate fastq with variants "spiked in" from dbsnp snps.
However, the coverage of the samples looks very funky! see how the coverage right over the exon almost resembles 2 prongs instead of one uniform coverage? Can we implement a parameter for sorts to fix this issue?


in this case the top is the original sequence and bottom is the spiked-in sequence..

Hi @jayaramanp several comments and questions:
- could you load the BED file used for simulation at the bottom of the igv screenshot?
- one strategy is flank the regions (by 500bp, say) and then use flanked regions for simulation to achieve better uniformity around original regions.
- another solution is increase read length (
-l
option for--simulator_options
) because apparently the green reads (I assume they are real data) are slightly longer than 100bp you used in #196
@yunfeiguo
- could you load the BED file used for simulation at the bottom of the IGV screenshot?
I essentially used a CDS region of a gene as a target.
1 | 100316583 | 100316695 | NM_000028.2_cds_1 1 | 100316583 | 100316695 | NM_000642.2_cds_1 1 | 100316583 | 100316695 | NM_000643.2_cds_1 1 | 100316583 | 100316695 | NM_000644.2_cds_1 1 | 100318210 | 100318274 | NM_000646.2_cds_1 1 | 100327043 | 100327284 | NM_000028.2_cds_2 1 | 100327043 | 100327284 | NM_000642.2_cds_2 1 | 100327043 | 100327284 | NM_000643.2_cds_2 1 | 100327043 | 100327284 | NM_000644.2_cds_2 1 | 100327043 | 100327284 | NM_000646.2_cds_2 1 | 100327797 | 100327994 | NM_000028.2_cds_3 1 | 100327797 | 100327994 | NM_000642.2_cds_3 1 | 100327797 | 100327994 | NM_000643.2_cds_3 1 | 100327797 | 100327994 | NM_000644.2_cds_3 1 | 100327797 | 100327994 | NM_000646.2_cds_3 1 | 100329926 | 100330160 | NM_000028.2_cds_4 1 | 100329926 | 100330160 | NM_000642.2_cds_4 1 | 100329926 | 100330160 | NM_000643.2_cds_4 1 | 100329926 | 100330160 | NM_000644.2_cds_4 1 | 100329926 | 100330160 | NM_000646.2_cds_4 1 | 100335940 | 100336152 | NM_000028.2_cds_5 1 | 100335940 | 100336152 | NM_000642.2_cds_5 1 | 100335940 | 100336152 | NM_000643.2_cds_5 1 | 100335940 | 100336152 | NM_000644.2_cds_5 1 | 100335940 | 100336152 | NM_000646.2_cds_5 1 | 100336298 | 100336440 | NM_000028.2_cds_6 1 | 100336298 | 100336440 | NM_000642.2_cds_6 1 | 100336298 | 100336440 | NM_000643.2_cds_6 1 | 100336298 | 100336440 | NM_000644.2_cds_6 1 | 100336298 | 100336440 | NM_000646.2_cds_6 1 | 100340227 | 100340381 | NM_000028.2_cds_7 1 | 100340227 | 100340381 | NM_000642.2_cds_7 1 | 100340227 | 100340381 | NM_000643.2_cds_7 1 | 100340227 | 100340381 | NM_000644.2_cds_7 1 | 100340227 | 100340381 | NM_000646.2_cds_7 1 | 100340694 | 100340827 | NM_000028.2_cds_8 1 | 100340694 | 100340827 | NM_000642.2_cds_8 1 | 100340694 | 100340827 | NM_000643.2_cds_8 1 | 100340694 | 100340827 | NM_000644.2_cds_8 1 | 100340694 | 100340827 | NM_000646.2_cds_8 1 | 100340898 | 100341026 | NM_000028.2_cds_9 1 | 100340898 | 100341026 | NM_000642.2_cds_9 1 | 100340898 | 100341026 | NM_000643.2_cds_9 1 | 100340898 | 100341026 | NM_000644.2_cds_9 1 | 100340898 | 100341026 | NM_000646.2_cds_9 1 | 100341998 | 100342168 | NM_000028.2_cds_10 1 | 100341998 | 100342168 | NM_000642.2_cds_10 1 | 100341998 | 100342168 | NM_000643.2_cds_10 1 | 100341998 | 100342168 | NM_000644.2_cds_10 1 | 100341998 | 100342168 | NM_000646.2_cds_10 1 | 100343181 | 100343399 | NM_000028.2_cds_11 1 | 100343181 | 100343399 | NM_000642.2_cds_11 1 | 100343181 | 100343399 | NM_000643.2_cds_11 1 | 100343181 | 100343399 | NM_000644.2_cds_11 1 | 100343181 | 100343399 | NM_000646.2_cds_11 1 | 100345463 | 100345617 | NM_000028.2_cds_12 1 | 100345463 | 100345617 | NM_000642.2_cds_12 1 | 100345463 | 100345617 | NM_000643.2_cds_12 1 | 100345463 | 100345617 | NM_000644.2_cds_12 1 | 100345463 | 100345617 | NM_000646.2_cds_12 1 | 100346172 | 100346366 | NM_000028.2_cds_13 1 | 100346172 | 100346366 | NM_000642.2_cds_13 1 | 100346172 | 100346366 | NM_000643.2_cds_13 1 | 100346172 | 100346366 | NM_000644.2_cds_13 1 | 100346172 | 100346366 | NM_000646.2_cds_13 1 | 100346616 | 100346748 | NM_000028.2_cds_14 1 | 100346616 | 100346748 | NM_000642.2_cds_14 1 | 100346616 | 100346748 | NM_000643.2_cds_14 1 | 100346616 | 100346748 | NM_000644.2_cds_14 1 | 100346616 | 100346748 | NM_000646.2_cds_14 1 | 100346832 | 100347018 | NM_000028.2_cds_15 1 | 100346832 | 100347018 | NM_000642.2_cds_15 1 | 100346832 | 100347018 | NM_000643.2_cds_15 1 | 100346832 | 100347018 | NM_000644.2_cds_15 1 | 100346832 | 100347018 | NM_000646.2_cds_15 1 | 100347081 | 100347262 | NM_000028.2_cds_16 1 | 100347081 | 100347262 | NM_000642.2_cds_16 1 | 100347081 | 100347262 | NM_000643.2_cds_16 1 | 100347081 | 100347262 | NM_000644.2_cds_16 1 | 100347081 | 100347262 | NM_000646.2_cds_16 1 | 100349660 | 100349815 | NM_000028.2_cds_17 1 | 100349660 | 100349815 | NM_000642.2_cds_17 1 | 100349660 | 100349815 | NM_000643.2_cds_17 1 | 100349660 | 100349815 | NM_000644.2_cds_17 1 | 100349660 | 100349815 | NM_000646.2_cds_17 1 | 100349879 | 100350022 | NM_000028.2_cds_18 1 | 100349879 | 100350022 | NM_000642.2_cds_18 1 | 100349879 | 100350022 | NM_000643.2_cds_18 1 | 100349879 | 100350022 | NM_000644.2_cds_18 1 | 100349879 | 100350022 | NM_000646.2_cds_18 1 | 100350109 | 100350274 | NM_000028.2_cds_19 1 | 100350109 | 100350274 | NM_000642.2_cds_19 1 | 100350109 | 100350274 | NM_000643.2_cds_19 1 | 100350109 | 100350274 | NM_000644.2_cds_19 1 | 100350109 | 100350274 | NM_000646.2_cds_19 1 | 100353518 | 100353679 | NM_000028.2_cds_20 1 | 100353518 | 100353679 | NM_000642.2_cds_20 1 | 100353518 | 100353679 | NM_000643.2_cds_20 1 | 100353518 | 100353679 | NM_000644.2_cds_20 1 | 100353518 | 100353679 | NM_000646.2_cds_20 1 | 100356760 | 100356927 | NM_000028.2_cds_21 1 | 100356760 | 100356927 | NM_000642.2_cds_21 1 | 100356760 | 100356927 | NM_000643.2_cds_21 1 | 100356760 | 100356927 | NM_000644.2_cds_21 1 | 100356760 | 100356927 | NM_000646.2_cds_21 1 | 100357146 | 100357310 | NM_000028.2_cds_22 1 | 100357146 | 100357310 | NM_000642.2_cds_22 1 | 100357146 | 100357310 | NM_000643.2_cds_22 1 | 100357146 | 100357310 | NM_000644.2_cds_22 1 | 100357146 | 100357310 | NM_000646.2_cds_22 1 | 100357972 | 100358178 | NM_000028.2_cds_23 1 | 100357972 | 100358178 | NM_000642.2_cds_23 1 | 100357972 | 100358178 | NM_000643.2_cds_23 1 | 100357972 | 100358178 | NM_000644.2_cds_23 1 | 100357972 | 100358178 | NM_000646.2_cds_23 1 | 100361826 | 100361959 | NM_000028.2_cds_24 1 | 100361826 | 100361959 | NM_000642.2_cds_24 1 | 100361826 | 100361959 | NM_000643.2_cds_24 1 | 100361826 | 100361959 | NM_000644.2_cds_24 1 | 100361826 | 100361959 | NM_000646.2_cds_24 1 | 100366176 | 100366432 | NM_000028.2_cds_25 1 | 100366176 | 100366432 | NM_000642.2_cds_25 1 | 100366176 | 100366432 | NM_000643.2_cds_25 1 | 100366176 | 100366432 | NM_000644.2_cds_25 1 | 100366176 | 100366432 | NM_000646.2_cds_25 1 | 100368223 | 100368365 | NM_000028.2_cds_26 1 | 100368223 | 100368365 | NM_000642.2_cds_26 1 | 100368223 | 100368365 | NM_000643.2_cds_26 1 | 100368223 | 100368365 | NM_000644.2_cds_26 1 | 100368223 | 100368365 | NM_000646.2_cds_26 1 | 100376252 | 100376418 | NM_000028.2_cds_27 1 | 100376252 | 100376418 | NM_000642.2_cds_27 1 | 100376252 | 100376418 | NM_000643.2_cds_27 1 | 100376252 | 100376418 | NM_000644.2_cds_27 1 | 100376252 | 100376418 | NM_000646.2_cds_27 1 | 100377945 | 100378088 | NM_000028.2_cds_28 1 | 100377945 | 100378088 | NM_000642.2_cds_28 1 | 100377945 | 100378088 | NM_000643.2_cds_28 1 | 100377945 | 100378088 | NM_000644.2_cds_28 1 | 100377945 | 100378088 | NM_000646.2_cds_28 1 | 100379067 | 100379309 | NM_000028.2_cds_29 1 | 100379067 | 100379309 | NM_000642.2_cds_29 1 | 100379067 | 100379309 | NM_000643.2_cds_29 1 | 100379067 | 100379309 | NM_000644.2_cds_29 1 | 100379067 | 100379309 | NM_000646.2_cds_29 1 | 100380929 | 100381057 | NM_000028.2_cds_30 1 | 100380929 | 100381057 | NM_000642.2_cds_30 1 | 100380929 | 100381057 | NM_000643.2_cds_30 1 | 100380929 | 100381057 | NM_000644.2_cds_30 1 | 100380929 | 100381057 | NM_000646.2_cds_30 1 | 100381950 | 100382068 | NM_000028.2_cds_31 1 | 100381950 | 100382068 | NM_000642.2_cds_31 1 | 100381950 | 100382068 | NM_000643.2_cds_31 1 | 100381950 | 100382068 | NM_000644.2_cds_31 1 | 100381950 | 100382068 | NM_000646.2_cds_31 1 | 100382138 | 100382302 | NM_000028.2_cds_32 1 | 100382138 | 100382302 | NM_000642.2_cds_32 1 | 100382138 | 100382302 | NM_000643.2_cds_32 1 | 100382138 | 100382302 | NM_000644.2_cds_32 1 | 100382138 | 100382302 | NM_000646.2_cds_32 1 | 100387074 | 100387222 | NM_000028.2_cds_33 1 | 100387074 | 100387222 | NM_000642.2_cds_33 1 | 100387074 | 100387222 | NM_000643.2_cds_33 1 | 100387074 | 100387222 | NM_000644.2_cds_33 1 | 100387074 | 100387222 | NM_000646.2_cds_33
- one strategy is flank the regions (by 500bp, say) and then use flanked regions for simulation to achieve better uniformity around original regions.
I can do that.
- another solution is increase read length (-l option for --simulator_options) because apparently the green reads (I assume they are real data) are slightly longer than 100bp you used in #196
sounds good. will do that. yes the green reads are real data.