MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

how is index loaded into memory when performing multiple queries?

Open marcmk6 opened this issue 3 years ago • 1 comments

Hi,

I just want to ask a quick question. Say I create index for the uniref30_2103_db database with 3 splits: mmseqs createindex uniref30_2103_db tmp --split 3 and I perform 50 queries (in a single .fasta file) on it using the colabfold_search.sh script provided on https://colabfold.mmseqs.com. Will each of the three partial index be loaded into memory for ~50 times? Assume my RAM cannot hold more than one partial index and I don't use the colabfold_envdb.

In other words, I'm wondering if mmseqs works like either 1)

for query in queries_in_fasta:
	for partial_index_file in indices:
		search(query, partial_index_file)

or 2)

for partial_index_file in indices:
	for query in queries_in_fasta:
		search(query, partial_index_file)

In the first case I guess each partial index will be loaded into RAM from storage repeatedly for num_of_queries times which is slow, but for the second case it's just once.

Thanks

marcmk6 avatar Jan 26 '22 07:01 marcmk6

We do the second version. We load an index and then process all queries against the split.

martin-steinegger avatar Feb 02 '22 06:02 martin-steinegger