bindash icon indicating copy to clipboard operation
bindash copied to clipboard

The input data format

Open bingyinglee opened this issue 1 year ago • 1 comments

Hi, I tried to use bindash to process my fasta data. I first used the following command: ./bindash sketch mydata.fas --outfname=genomeA.sketch The mydata.fas file size is about 50M, containing more than 20,000 nucleotide sequences. But the generated .sketch file is only 1kb. There must be something wrong, but I don't know where to modify it. Are there any requirements for the input data format?

bingyinglee avatar Sep 10 '24 08:09 bingyinglee

Hi @bingyinglee,

The output file size is only related to the sketch size (--sketchsize64 M and --bbits N option) if your purpose is to compute genomic distance among your files. Sketches are just first N bits of M 64 bit integers so it is not that big. You can increase --sketchsize64 to 200 or even several thousand if you want accuracy at 99% or 99.99% ANI above (a widely used metric for genomic distance). This tool is only for genomic distance estimation, not for fastq/fasta file quality control or something.

Let me know if I am not clear.

Best,

Jianshu

jianshu93 avatar Sep 10 '24 10:09 jianshu93

Closed since this question has been answered by jianshu93

zhaoxiaofei avatar Jun 27 '25 03:06 zhaoxiaofei