MMseqs2
MMseqs2 copied to clipboard
how is index loaded into memory when performing multiple queries?
Hi,
I just want to ask a quick question.
Say I create index for the uniref30_2103_db database with 3 splits: mmseqs createindex uniref30_2103_db tmp --split 3
and I perform 50 queries (in a single .fasta file) on it using the colabfold_search.sh
script provided on https://colabfold.mmseqs.com. Will each of the three partial index be loaded into memory for ~50 times? Assume my RAM cannot hold more than one partial index and I don't use the colabfold_envdb.
In other words, I'm wondering if mmseqs works like either 1)
for query in queries_in_fasta:
for partial_index_file in indices:
search(query, partial_index_file)
or 2)
for partial_index_file in indices:
for query in queries_in_fasta:
search(query, partial_index_file)
In the first case I guess each partial index will be loaded into RAM from storage repeatedly for num_of_queries
times which is slow, but for the second case it's just once.
Thanks
We do the second version. We load an index and then process all queries against the split.