mairix icon indicating copy to clipboard operation
mairix copied to clipboard

Index > 2GB unsupported

Open flohoff opened this issue 2 years ago • 2 comments

Hi, i am running into the issue that mairix fails on indexing because the index gets larger than 2GB. When that happens (On initial creation) the index is left a "0" bytes.

It throws an error on the "lseek" which is 32 bit only:

writer.c
106   if (sb.st_size < len) {
107     /* Extend */
108     if (lseek(fd, len - 1, SEEK_SET) < 0) {
109       report_error("lseek", filename);
110       unlock_and_exit(2);
111     }

Flo

flohoff avatar Mar 31 '22 13:03 flohoff

Unfortunately a whole bunch of offset values stored inside the index database which refer to data at other offsets inside the index file are fundamentally 32 bits. This is not just a matter of the lseek syscall, it requires an overhaul of the index format.

If we were to overhaul the whole database format, I think I'd want to take the opportunity to do it using something less error-prone, less custom, more modern and high level than pointer arithmetic, like sqlite, maybe with some protobuf. It would hardly be the same piece of software.

You must have a lot of email though! My personal mairix database which indexes all of my sent and received email for decades is only 81MiB. This excludes all mailing lists and (most) spam, to be sure; maybe that's the difference with yours.

vandry avatar Apr 24 '22 15:04 vandry

Excluding mailinglists but compressed archive of old mail (gzip -9 on monthly archive folders since 1996)

flo@pax:~$ du -sh Mail 39G Mail

I now exluded old work email and thus reduced the index:

flo@pax:~$ ls -la .mairix* -rw------- 1 flo flo 876498632 Apr 24 05:05 .mairix_database -rw-r--r-- 1 flo flo 185 Mar 31 15:21 .mairixrc

Flo

flohoff avatar Apr 24 '22 15:04 flohoff