Jianshu_Zhao

Results 106 comments of Jianshu_Zhao

thanks daniel.This is very helpful. jianshu

Hello Daniel, time dashing2 sketch -k 11 -S 12000 --threads 24 --pminhash --topk 250 --cmpout phage_GPD_topK_250.txt -Q name.txt -F reference.txt With and without the --topk 250 option, I have exactly...

Hi Daniel, Many thanks for the message! I will try and let you know what i get. Jianshu

Oh thanks daniel very much. glad that you have this functionality. It will be very useful for dereplication. Is this setsketch kmer abundance weigted? I was using the probminhash by...

Hello Daniel, I am comparing microbial genomes, so size is not so big for each genome, but a lot of genomes (millions of genome files), I think the greedy search...

Hello Daniel My understanding for dereplicaiton is kmer based method does not work very well at very high resolution,e.g., if 2 genome proteome, 95% average amino acid identity, can it...

Hello Daniel, I ran a test last night, with --greedy 0.8, for all my preoteomoes nearly each one form a new cluster, 47609 out of 47894 (bacteria proteome). here 0.8...

Hello Daniel. When using greedy 0.1, probminhash, I have only 24415 cluster out of 47894 proteome. seems similarity is similarity, distance, we should use distance for similarity options. I expect...

Hello Daniel, nohup time dashing2 sketch --threads 24 --greedy 0.069 --pminhash -k28 -S 12000 --cmpout deduplicated_k28S12000g0069.txt *.fna & I am using it for dereplicaiton of 47894 microbial genome in fasta...

Hello Daniel, As you suggested, with about 700 G memory, the genome in nt format dereplication works! about 400G memory was needed for the entire GTDB database (47804 genomes): dashing2...