xapian-haystack icon indicating copy to clipboard operation
xapian-haystack copied to clipboard

Add Xapian Omega solution to haystack backend to fix long term issues

Open doctormo opened this issue 5 years ago • 5 comments

Inside the xapian project, they have solved the "Term Too Long" error by providing two different options inside their omega side project, one is to truncate the terms and the other is to hash the terms.

See: https://lists.xapian.org/pipermail/xapian-discuss/2007-March/003450.html for example.

This commit adds this capability to this haystack backend. It's been tested with ascii, urls, japonese and strings of emoji on python 3.6, django 1.11.15 in both management command and real time updating.

doctormo avatar Dec 19 '18 04:12 doctormo

Xapian developers recommend a different method:

https://trac.xapian.org/wiki/FAQ/UniqueIds#Workingroundthetermlengthlimit

Based on the link above I did something different:

Fix long term

alexsilva avatar Jan 28 '19 15:01 alexsilva

@alexsilva Isn't that documentation specific to unique IDs. But in either case, you could add SPLIT as one of the methods for the above code instead of having it be specific cast as one item.

doctormo avatar Jan 28 '19 16:01 doctormo

Can one of these proposed solutions please be incorporated into the xapian-haystack package? I've had to hack a patch into our copy in order to allow Mailman 3 to successfully index emails containing long URLs but it would be much better/cleaner if xapian-haystack incorporated a solution permanently.

pcolmer avatar Jan 20 '22 08:01 pcolmer

I guess we'll have to wait for a rebased patch including tests.

claudep avatar Jan 20 '22 15:01 claudep

Can one of these proposed solutions please be incorporated into the xapian-haystack package? I've had to hack a patch into our copy in order to allow Mailman 3 to successfully index emails containing long URLs but it would be much better/cleaner if xapian-haystack incorporated a solution permanently.

@pcolmer what exactly did you do to enable Mailman 3 to successfully index the emails? I am stuck at the same point. TIA

odhiambo avatar Mar 03 '23 14:03 odhiambo