xapian-haystack
xapian-haystack copied to clipboard
Add Xapian Omega solution to haystack backend to fix long term issues
Inside the xapian project, they have solved the "Term Too Long" error by providing two different options inside their omega side project, one is to truncate the terms and the other is to hash the terms.
See: https://lists.xapian.org/pipermail/xapian-discuss/2007-March/003450.html for example.
This commit adds this capability to this haystack backend. It's been tested with ascii, urls, japonese and strings of emoji on python 3.6, django 1.11.15 in both management command and real time updating.
Xapian developers recommend a different method:
https://trac.xapian.org/wiki/FAQ/UniqueIds#Workingroundthetermlengthlimit
Based on the link above I did something different:
@alexsilva Isn't that documentation specific to unique IDs. But in either case, you could add SPLIT as one of the methods for the above code instead of having it be specific cast as one item.
Can one of these proposed solutions please be incorporated into the xapian-haystack
package? I've had to hack a patch into our copy in order to allow Mailman 3 to successfully index emails containing long URLs but it would be much better/cleaner if xapian-haystack
incorporated a solution permanently.
I guess we'll have to wait for a rebased patch including tests.
Can one of these proposed solutions please be incorporated into the
xapian-haystack
package? I've had to hack a patch into our copy in order to allow Mailman 3 to successfully index emails containing long URLs but it would be much better/cleaner ifxapian-haystack
incorporated a solution permanently.
@pcolmer what exactly did you do to enable Mailman 3 to successfully index the emails? I am stuck at the same point. TIA