SimSeq icon indicating copy to clipboard operation
SimSeq copied to clipboard

Limitations + Problems

Open sej917 opened this issue 12 years ago • 10 comments

Hi John,

I recently read the Assemblathon 1 paper published in Genome Research and may be interested in using SimSeq for a project I am working on. In the paper, a few limitations and problems of SimSeq are mentioned. Have these been addressed/fixed?

Any information will be greatly appreciated!

Thanks.

sej917 avatar May 24 '12 19:05 sej917

Hello, I haven't yet implemented the fix to get around the problem of the reads being sorted by chromosome on the output. Reads are random within chromosomes, but they are sorted between chromosomes. So that issue hasn't been fixed. One way around this would be to shuffle the order of fastq reads after you generate them. I also haven't implemented the improvement to the error profile where each position's error rate is dependent on the error of the previous position. Each position is still sampled independently of previous and subsequent positions, although there is a position specific model. The issue with a reverse error profile being used was fixed in that I provide a correct error model for download on the GitHub page; the incorrect ones are also available for download but are labeled as being specific to the Assemblathon1. I had also started working on adding in a simulation of adapter sequences showing up at the ends of reads when the fragments are too short, but I don't remember how far along I got with that, and I don't think the finished version of that ever made it to GitHub. The version I have for adding adapters to the simulation will do it on PE reads, but not MP reads. Let me know if that answers your questions. -John

On May 24, 2012, at 12:36 PM, sej917 wrote:

Hi John,

I recently read the Assemblathon 1 paper published in Genome Research and may be interested in using SimSeq for a project I am working on. In the paper, a few limitations and problems of SimSeq are mentioned. Have these been addressed/fixed?

Any information will be greatly appreciated!

Thanks.


Reply to this email directly or view it on GitHub: https://github.com/jstjohn/SimSeq/issues/4

jstjohn avatar May 24 '12 22:05 jstjohn

Hello John! I want to simulate 250bp read length from MiSeq, but it seems the read_len for PE reads (MP reads too, but I have not tried yet!) simulation can not be longer than 100bp. I changed the def_read_len to 300 in the "Main.java" source code, but did not work. I appreciate if you could let me know how to change the settings/code to implement my case. Thanks a lot!

Yifang

yifangt avatar Dec 21 '15 18:12 yifangt

Ah, you would have to make a new model that supports longer reads as well, which means you need sequencing data probably aligned to PhiX from the instrument you want to emulate. On Mon, Dec 21, 2015 at 10:19 AM Yifang Tan [email protected] wrote:

Hello John! I want to simulate 250bp read length from MiSeq, but it seems the read_len for PE reads (MP reads too, but I have not tried yet!) simulation can not be longer than 100bp. I changed the def_read_len to 300 in the "Main.java" source code, but did not work. I appreciate if you could let me know how to change the settings/code to implement my case. Thanks a lot!

Yifang

— Reply to this email directly or view it on GitHub https://github.com/jstjohn/SimSeq/issues/4#issuecomment-166381096.

jstjohn avatar Dec 21 '15 18:12 jstjohn

Thanks John! Then, do you have a model for that, or how to make read error files? I had thought it is simple issue for you to modify, isn't it? I also noticed a file for 250bp profiles there which looks similar to those two error files for 100bp reads. Is there anything to do with it to create error files for 250bp reads? Thanks!

yifangt avatar Dec 21 '15 19:12 yifangt

It has been about 4 years since I last touched this. I think there is a c program in one of the subdirectories you could use (after compiling with Make). The 250bp error profile there would be from a few years ago when quality was really bad at that read length. On Mon, Dec 21, 2015 at 11:21 AM Yifang Tan [email protected] wrote:

Thanks John! Then, do you have a model for that, or how to make read error files? I had thought it is simple issue for you to modify, isn't it? I also noticed a file for 250bp profiles there which looks similar to those two error files for 100bp reads. Is there anything to do with it to create error files for 250bp reads? Thanks!

— Reply to this email directly or view it on GitHub https://github.com/jstjohn/SimSeq/issues/4#issuecomment-166395711.

jstjohn avatar Dec 21 '15 19:12 jstjohn

Thanks! The program is located in cUtils subdfolder. In case somebody comes over in the future I put here two hickups I met, 1) was resolved:

  1. undefined reference for log10() / pow() functions at compiling; Tried putting the $(LDFLAGS) at the end of the line in makefile: faSize: $(LIBOBJECTS) $(FASIZEO) $(CC) $(COPTS) $(FASIZEO) $(LIBOBJECTS) -o $@ $(LDFLAGS) The bug went away.
  2. Segment fault when I run: $ samtools view -S ../sample_data_20151221/S01-aln-pe.sam | ./getErrorProfile -ref=../sample_data_20151221/S01_Scaffolds_gt1kb.fasta -readLen=251 -phred33 >error_profile_251bp.txt

[samopen] SAM header is present: 1 sequences. Segmentation fault

Any help would be appreciated! Yifang

yifangt avatar Dec 21 '15 20:12 yifangt

Is the seg fault in my prog or santools? On Mon, Dec 21, 2015 at 12:43 PM Yifang Tan [email protected] wrote:

Thanks! The program is located in cUtils subdfolder. In case somebody comes over in the future I put here two hickups I met, 1) was resolved:

  1. undefined reference for log10() / pow() functions at compiling; Tried putting the $(LDFLAGS) at the end of the line in makefile: faSize: $(LIBOBJECTS) $(FASIZEO) $(CC) $(COPTS) $(FASIZEO) $(LIBOBJECTS) -o $@ $(LDFLAGS) The bug went away.
  2. Segment fault when I run: $ samtools view -S ../sample_data_20151221/S01-aln-pe.sam | ./getErrorProfile -ref=../sample_data_20151221/S01_Scaffolds_gt1kb.fasta -readLen=251 -phred33 >error_profile_251bp.txt

[samopen] SAM header is present: 1 sequences. Segmentation fault

Any help would be appreciated! Yifang

— Reply to this email directly or view it on GitHub https://github.com/jstjohn/SimSeq/issues/4#issuecomment-166412368.

jstjohn avatar Dec 21 '15 21:12 jstjohn

I tried to view the sam file with samtools, which was fiine. So, I think it is with your program. I tried two ways to feed the sam file:

  1. by stdin: samtools view -S ./S01-aln-pe.sam | ./getErrorProfile -ref=./S01_Scaffolds_gt1kb.fasta -readLen=251 -phred64 >error_profile_251bp.txt

[samopen] SAM header is present: 1 sequences. Segmentation fault

  1. by explicit sam file name (headless sam file) ./getErrorProfile -ref=./S01_Scaffolds_gt1kb.fasta -readLen=251 -phred33 -sam=./S01-aln-no-header-pe.sam Segmentation fault

Did I miss anything?

yifangt avatar Dec 21 '15 21:12 yifangt

Nope. You'll have to do GDB to find out which line the program is crashing on. Ever used gdb before? I am away from a computer for a few weeks so unfortunately I can't do that for you. On Mon, Dec 21, 2015 at 1:41 PM Yifang Tan [email protected] wrote:

I tried to view the sam file with samtools, which was fiine. So, I think it is with your program. I tried two ways to feed the sam file:

  1. by stdiin: samtools view -S ./S01-aln-pe.sam | ./getErrorProfile -ref=./S01_Scaffolds_gt1kb.fasta -readLen=251 -phred64

error_profile_251bp.txt

[samopen] SAM header is present: 1 sequences. Segmentation fault

  1. by explicit sam file name (headless sam file) ./getErrorProfile -ref=./S01_Scaffolds_gt1kb.fasta -readLen=251 -phred33 -sam=./S01-aln-no-header-pe.sam Segmentation fault

Did I miss anything?

— Reply to this email directly or view it on GitHub https://github.com/jstjohn/SimSeq/issues/4#issuecomment-166432105.

jstjohn avatar Dec 21 '15 22:12 jstjohn

I'll try to figure that out, or come back to you later. Thanks for your program and have great holidays!

UPDATE: dbg indicated the bug is on line 229: Program received signal SIGSEGV, Segmentation fault. 0x00000000004038a3 in getErrorProfile () at getErrorProfile.c:229 if(mutation[i][pscore][0][0] < pseudo){ //add in pseudocounts for everything other than N, haven't done this already Maybe this will give you some clue to help me out when you are back. Thanks a lot!

yifangt avatar Dec 21 '15 22:12 yifangt