pisa icon indicating copy to clipboard operation
pisa copied to clipboard

Disk-streamed indexing can result in non-deterministic indexes if there is insufficient /tmp storage

Open JMMackenzie opened this issue 2 years ago • 3 comments

Describe the bug As the title suggests; I had some issues with segfaults upon mmapping an index. It turned out that the index was being built (but incorrectly) because there was insufficient storage space on /tmp. No errors were thrown though.

Running the indexing command with the --check flag threw an error (demonstrating that the index built was wrong).

I will look into this in more detail soon, but am just marking it here for now.

JMMackenzie avatar Jun 05 '22 22:06 JMMackenzie

+1, just had the same error for a while, thanks @JMMackenzie for pointing this out. Going for a machine with more /tmp space solved the problem but without that I would have no idea that was the problem.

cadurosar avatar Aug 10 '22 10:08 cadurosar

@elshize I think we'd need to run a check somewhere here: https://github.com/pisa-engine/pisa/blob/196a6e6e0b2312fad5befa58e613be4485294aa6/include/pisa/block_freq_index.hpp#L154

I can never remember the correct way to do it, but I think we need to flush and then check for .bad() or something like that?

JMMackenzie avatar Aug 10 '22 23:08 JMMackenzie

@JMMackenzie yeah, something like this. Didn't have time to have a look today. I'll have to refresh my memory. There are many places like this where we should do those kind of checks tbh. We weren't very good about this. In our defense, this is way more complicated to do than it should be in C++. But it is what it is, we need to address that.

elshize avatar Aug 11 '22 00:08 elshize

@JMMackenzie I think this should be actually resolved, because the issue is about the disk-streamed indexing, which was addressed in #484

There are more places where checks should be done, but I don't see much sense in tracking all of them here. Let's close, and we'll open specific ones (or one tracking issue) instead.

elshize avatar Sep 08 '22 22:09 elshize