onegov-cloud icon indicating copy to clipboard operation
onegov-cloud copied to clipboard

Feature/ogc 508 replace elastic search by postgres text search

Open Tschuppi81 opened this issue 2 years ago • 5 comments

INITIAL FEEDBACK TO TEXT SEARCH IN POSTGRES - WORK IN PROGRESS

Feature/ogc 508 replace elastic search by postgres text search

core: Replace elastic search by postgres full text search

With this changes elastic search (es) is still in place and productive while we introduce the postgres full text search accessible using 'search_postgres' keyword in the url. es url: http://localhost:8080/onegov_agency/bs/search?q=Arnold
psql url: http://localhost:8080/onegov_agency/bs/search_postgres?q=Arnold

hint: onegov-core upgrade

Main changes:

  • Re-indexing
    ../onegov-cloud/src/onegov/search/cli.py
  • Upgrade script (uses the same code as re-indexing)
    ../onegov-cloud/src/onegov/search/upgrade.py
  • Extension of mixin class Searchable
    .. /onegov-cloud/src/onegov/search/mixins.py
  • Adjusting all searchable models adding the fts index in postgres

Open Items:

  • unit tests are not reworked yet
  • language configuration
  • improve ranking

Checklist

  • [x] I have performed a self-review of my code
  • [x] I considered adding a reviewer
  • [x] I have added an upgrade hint such as data migration commands to be run
  • [x] I made changes/features for both org and town6
  • [ ] I have tested my code thoroughly by hand
    • [x] I have tested database upgrades
  • [ ] I have added tests for my changes/features

Tschuppi81 avatar Jul 13 '23 11:07 Tschuppi81

Codecov Report

Attention: 54 lines in your changes are missing coverage. Please review.

Comparison is base (fa0d2b9) 88.03% compared to head (9e454e3) 35.09%. Report is 16 commits behind head on master.

Files Patch % Lines
src/onegov/search/mixins.py 11.76% 45 Missing :warning:
src/onegov/user/models/user.py 0.00% 5 Missing :warning:
src/onegov/core/upgrade.py 33.33% 2 Missing :warning:
src/onegov/search/__init__.py 0.00% 1 Missing :warning:
src/onegov/search/utils.py 50.00% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #918       +/-   ##
===========================================
- Coverage   88.03%   35.09%   -52.95%     
===========================================
  Files        1177       71     -1106     
  Lines       77378     4591    -72787     
===========================================
- Hits        68118     1611    -66507     
+ Misses       9260     2980     -6280     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jul 20 '23 04:07 codecov-commenter

@Tschuppi81 As a side note you probably should merge master again (or at least cherry pick 25d7d0e8436c537c8a314c0b742ebe0385fafed3) to skip the currently broken browser tests.

Daverball avatar Jul 20 '23 06:07 Daverball

Escaping in postgres is done using quotes around column name ""

Tschuppi81 avatar Jul 25 '23 14:07 Tschuppi81

@Daverball In the above commit I am fighting with more complex hybrid expressions. Most of them gather data from other tables and I could not figure out how to properly join the corresponding tables. Currently the 'reindexing' fails in CourseEvent. Note: This commit to be reverted - just shows the join problem...

Tschuppi81 avatar Nov 27 '23 17:11 Tschuppi81

@Tschuppi81 This is going to be tricky in general and probably a bad idea in terms of performance anyways, you are going to be better off adding caches for the joined attributes and using @observes methods to keep the caches up to date.

Or we could give up on calculating the search index online, since we're still quite a ways off and just generate a dictionary offline of all the search values which then can be turned into a TSVECTOR directly if interpreted as JSONB. While somewhat inefficient, this would still be quicker than what we had before with elasticsearch, since we save ourselves from having to talk with the elasticsearch server. It would also get rid of the issues related to polymorphism, since SQLAlchemy is aware of what polymorphic identity we're talking about, while in postgres we would have to write CASE statements to distinguish the various versions in the same table.

Daverball avatar Nov 27 '23 18:11 Daverball