cdhit icon indicating copy to clipboard operation
cdhit copied to clipboard

Option to save and reload index table

Open sysimm opened this issue 2 years ago • 0 comments

Hello, I was wondering if it's possible to save the index table with the k-mers generated from input sequences to disk and later retrieve it, in order to speed up clustering. My idea is to do this for large datasets, using cdhit-2d: one input dataset would be provided by the user (i.e. the index table would always be computed on the fly) and the other would come from a prepared selection of datasets. For the latter, I would like to precompute index tables to speed up the overall comparison. I don't know how much of the total runtime is spent creating the index tables but I would imagine it to be considerable for large datasets. Please correct me if I'm wrong. Please advise if this is possible at all or can be somehow done by tweaking the code. Thank you

sysimm avatar Dec 08 '21 09:12 sysimm