octopus icon indicating copy to clipboard operation
octopus copied to clipboard

Are there some option to reduce temporary vcf file number?

Open xiekunwhy opened this issue 2 years ago • 3 comments

Hi,

It seems that octopus open one temporary vcf file per contigs/scaffolds. For many non-model species, there are many contigs/scaffolds in their reference genome, for example https://www.ncbi.nlm.nih.gov/assembly/GCA_000966675.2/ , the number of contig/scaffold of this assembly is 4,464,856. And I thank octopus can not use for these species because there are too many files need to open.

Are there some options to reduce temporary vcf file number, or would please add some?

Best, Kun

xiekunwhy avatar Jul 15 '21 05:07 xiekunwhy

Hi, you're correct that Octopus creates a temporary VCF for each contig in the input regions, this is to enable parallel processing of each contig. However, these temporary VCFs are opened dynamically so there should only be one temporary VCF open at any one time. If you're running into problems can you post the error you're seeing?

dancooke avatar Jul 16 '21 11:07 dancooke

Hi,

No Octopus' errors, but file number was up to my hardware system limits and I can not write any thing before removing those temporary VCFs, there are 4000+ individuals need to call. I think you can move a single contig temporary VCF into individual's vcf file and remove it immediately when it was finished .

Best, Kun

xiekunwhy avatar Jul 17 '21 01:07 xiekunwhy

Or can I use Ns to connect contigs/scaffolds to construct longer scaffolds to reduce the temporary vcf files?

xiekunwhy avatar Aug 03 '21 10:08 xiekunwhy