MITObim icon indicating copy to clipboard operation
MITObim copied to clipboard

iteration file size

Open ghost opened this issue 7 years ago • 3 comments

Dear Chris, Thank you for developing and maintaining the repository. I have a question regarding iterations fastq files. I am running the MITObim on 126G file. I am wondering if the size of fastq files in each iterations need to reach 126G before moving to the next iteration if necessary ? I particularly mean files like the below file e.g culatus-readpool-it0.fastq. If this is the case I think I need to subsample input file. Many thanks in advance

ghost avatar Nov 20 '18 16:11 ghost

Hi, The readpool file that you are referring to contains only those reads that had a certain similarity to the reference/bait that you provided. Normally, in the 'standard' situation when you are targeting an organellar genome from a genomic DNA library, it's only a fraction of the total readpool though. These are used for the assembly in this iteration. If you want to keep your overall disk space small then you can use the --clean option. This will only ever retain the last 2 iteration directories. Overall 126G file sounds like a lot of data, so I would definitely downsample for a first test. Best wishes, Christoph

chrishah avatar Nov 21 '18 08:11 chrishah

Thanks Chris for your kind answer. I started running the actual run on our chpc. As you better know due to wall time limits I had to resume the assembly from previous iteration where it ended. I see in the log file that while the contig file size of the last iteration was 10k bp, the new run log files shows that contigs are building up from seed size(?) again. It is back to 1096 bp again. Is there anything I have missed ? Many thanks

I have embedded the the last few lines of the log file:==============

ITERATION 44

Nov 22 18:19:53

recover backbone by running miraconvert on maf file

fishing readpool using mirabait (k = 31)

fishing readpool using mirabait (k = 31)

running mapping assembly using MIRA

readpool contains 71524 reads assembly contains 1 contig(s) contig length: 1067

now removing directory iteration41

ghost avatar Nov 22 '18 16:11 ghost

Sorry Chris ! I guess I got my answer in the source code: quick option selected! -maf option will be ignored (if given) cheers

ghost avatar Nov 22 '18 16:11 ghost