xapian-haystack icon indicating copy to clipboard operation
xapian-haystack copied to clipboard

Add Xapian Omega solution to haystack backend to fix long term issues

Open anarcat opened this issue 1 year ago • 7 comments

Reroll of #181 with @msapiro's tweak.

anarcat avatar Feb 07 '25 02:02 anarcat

Thanks, could you please also add tests for this functionality?

claudep avatar Feb 07 '25 07:02 claudep

well, i didn't write the original patch, and i have little familiarity with django development, let alone xapian or xapian-haystack...

where would i begin?

anarcat avatar Feb 07 '25 16:02 anarcat

honestly, looking at the current test suite, it's going to be really hard for me to figure out what to do. i don't see anywhere where problematic corpus can be added, and even if there would be, i don't even know what our problematic corpus is.

part of the problem is this bug doesn't seem to have been properly reported against xapian-haystack. or, more accurately, all related reports were closed (#153, #77)...

I came from https://gitlab.com/mailman/hyperkitty/-/issues/408, which refered to those issues and the pull request in #181. I only submitted this PR because I was requested to rebase in #181, I didn't expect to be onboarded into writing unit tests here. :)

That said, it looks like indexing a simple 255+ character string is enough to trigger the bug.

In https://gitlab.com/mailman/hyperkitty/-/issues/408#note_1470192180, the following string is given as an example:

'#696969':'dimgray','#696969':'dimgrey','#1e90ff':'dodgerblue','#b22222':'firebrick','#fffaf0':'floralwhite','#228b22':'forestgreen',

I suspect this would be enough:

'x' * 255

but i don't know where to plug this...

anarcat avatar Feb 07 '25 16:02 anarcat

Maybe @doctormo or any other follower could help with creating a test?

claudep avatar Feb 07 '25 16:02 claudep

i mentioned it on hyperkitty's side too, i'm kind of hoping @msapiro will jump in to save the day again ;)

anarcat avatar Feb 07 '25 16:02 anarcat

Actually, what's the situation here? The bugfix won't be accepted unless there are unit tests, even though it fixes real-world problems?

That would be quite inconvenient... :)

anarcat avatar Feb 10 '25 19:02 anarcat

Just ran into this myself when indexing mailing lists with long lines/urls/... A fix for it would be really nice, and this solution looks sensible to me.

BtbN avatar Aug 22 '25 01:08 BtbN