Add Xapian Omega solution to haystack backend to fix long term issues
Reroll of #181 with @msapiro's tweak.
Thanks, could you please also add tests for this functionality?
well, i didn't write the original patch, and i have little familiarity with django development, let alone xapian or xapian-haystack...
where would i begin?
honestly, looking at the current test suite, it's going to be really hard for me to figure out what to do. i don't see anywhere where problematic corpus can be added, and even if there would be, i don't even know what our problematic corpus is.
part of the problem is this bug doesn't seem to have been properly reported against xapian-haystack. or, more accurately, all related reports were closed (#153, #77)...
I came from https://gitlab.com/mailman/hyperkitty/-/issues/408, which refered to those issues and the pull request in #181. I only submitted this PR because I was requested to rebase in #181, I didn't expect to be onboarded into writing unit tests here. :)
That said, it looks like indexing a simple 255+ character string is enough to trigger the bug.
In https://gitlab.com/mailman/hyperkitty/-/issues/408#note_1470192180, the following string is given as an example:
'#696969':'dimgray','#696969':'dimgrey','#1e90ff':'dodgerblue','#b22222':'firebrick','#fffaf0':'floralwhite','#228b22':'forestgreen',
I suspect this would be enough:
'x' * 255
but i don't know where to plug this...
Maybe @doctormo or any other follower could help with creating a test?
i mentioned it on hyperkitty's side too, i'm kind of hoping @msapiro will jump in to save the day again ;)
Actually, what's the situation here? The bugfix won't be accepted unless there are unit tests, even though it fixes real-world problems?
That would be quite inconvenient... :)
Just ran into this myself when indexing mailing lists with long lines/urls/... A fix for it would be really nice, and this solution looks sensible to me.