remark42 icon indicating copy to clipboard operation
remark42 copied to clipboard

Full text search

Open vdimir opened this issue 4 years ago • 38 comments

Resolves #734

Uses bleve.

vdimir avatar Aug 02 '20 16:08 vdimir

I have trouble reviewing this PR because of vendoring I suppose: my IDE just refuses to show a list of files. Would you be so kind as to split your repo into a branch with vendoring and branch on top of it with just the code changes, create a PR with just the code changes and ping me for review there?

paskal avatar Aug 02 '20 21:08 paskal

@paskal I removed vendor from this PR and pushed it to separate branch full-text-search-vendor. I haven't create PR for this separate branch yet, because maybe it is better to keep one PR and merge changes back after reviews? Or I'll create another PR later if we decide to do it.

vdimir avatar Aug 04 '20 06:08 vdimir

Sorry for not being clear, I wanted to have PR like one we have here to be a separate one on top of vendoring branch in your fork, so I would review it there. I'll try to look into this by tomorrow evening.

paskal avatar Aug 04 '20 21:08 paskal

I expect to have the ability to renew it this weekend.

paskal avatar Aug 08 '20 01:08 paskal

Another round, I reviewed everything but search module as it's a beast on its own. @vdimir please let me know if what I wrote something which makes sense or I'm looking into wrong things and that's not what you were looking for.

Thanks for feedback! I have read comments and understand how to fix most of them. I'll fix it or reply in conversation.

vdimir avatar Aug 11 '20 07:08 vdimir

I should have time tonight to review it thoughtfully.

paskal avatar Aug 24 '20 18:08 paskal

I should have time tonight to review it thoughtfully.

Thanks!

I think main issue that I haven't figure out is wrapping store.Interface. Wrapping store with proxy is simple way to handle all store calls and perform additional work to support search without modifying store. But I understand that this solution not so clear, but I don't know how to do it better.

vdimir avatar Aug 24 '20 18:08 vdimir

Very busy work week, I hope I'll have time for the review either on Friday or Sunday.

paskal avatar Aug 27 '20 20:08 paskal

Excellent job! Few very minor things to fix plus few tests to write, and also please check that it compiles as it doesn't now. After fixing that review notes ping Umputun directly, I don't think I have any more things I'm conserned about other than ones written in that review comments.

Thanks! A lot of comments connected with typos, and it is not the thing that is required checking from reviewer, sorry for that, I'll more careful and will try to setup spellchecker or something like that to not disturb reviewer with such minors.

I hope I am finishing fixing latest issues soon.

vdimir avatar Oct 08 '20 07:10 vdimir

I finally run service after all modification, load bunch of comments like in first post in this PR, seems it works. Should I merge vendor files? Now it is in separate branch full-text-search-vendor

vdimir avatar Oct 08 '20 09:10 vdimir

@umputun could you review the code, please?

vdimir avatar Oct 08 '20 09:10 vdimir

Sorry for the long delay and thx for the impressive work and detailed reviews.

I have tried to understand the general structure/flow before I dig deeper into implementation details and it got me confused a little bit on multiple levels. Somehow the overall structure doesn't look intuitive to me and I found myself looking at things implementing unexpected logic in unexpected places. Part of my confusion represented in the comments I've made, but I'm not really sure if my comments make much sense due to the general confusion.

Fundamentally all of this feels to me as two levels of abstraction, Servcie and Engine:

  • search.Service - this is a struct (interfaces could be defined by consumers) providing all high-level API for the search functionality the end consumer (i.e. REST) needs. I'd expect some kind of Search call here. Probably nothing else needed, at least logically.

  • both engines (bleve and elastic) are some implementations of search.Engine interface defined near search.Service

  • the logic is currently done by multiplexer, i.e. different sites support, probably should be the core part of the search.Service

  • the service should be used directly by REST at least to perform the search. It shouldn't be embedded into servcie.DataStore

  • the other part of search.Service (extracted interface) responsible for creating/updates/delete can be integrated into service.DataStore so, the current Create/Edit/Delete calls will affect search indexes properly

  • The other package structure could represent this as a couple of sub-packages - engine and service or search_engine and search_service to make imports clear and avoid conflicts. On the top level (i.e search.go) we can keep all shared things used by both packages and this way import loops could be prevented.

and just to reiterate the general flow from the consumer point of view: the only thing consumers care about and use are interfaces extracted from the service struct. Some consumers may need a different subset of those methods.

let me know what you think

umputun avatar Nov 01 '20 22:11 umputun

@umputun , thank you for review!

I think I've got general idea and I'm going to rewrite PR reusing my code with respect to your notes. If some additional question will occur during implementation, I'll ask here. I hope I'll start soon :)

vdimir avatar Nov 16 '20 07:11 vdimir

@vdimir I would love to have your work incorporated into the next release (1.7.0), please let us know if there is any way to assist you.

paskal avatar Nov 30 '20 09:11 paskal

@vdimir I would love to have your work incorporated into the next release (1.7.0), please let us know if there is any way to assist you.

Thanks for the concern. I'm going to work on it this weekend, I think after that we can discuss it more detailed. Also when new release is planned to happen?

vdimir avatar Dec 01 '20 17:12 vdimir

After these changes + documentation ready, + some UI changes if we'll find someone to make them in time.

paskal avatar Dec 01 '20 17:12 paskal

Sorry, I've started refactoring, but haven't enough time to work on it. I really want to finish, but I'll postpone it to the end of month or beginning of next month.

UPD 16-04-2021: still no progress, I'll convert it to draft

vdimir avatar Dec 12 '20 18:12 vdimir

@vdimir, feel free to ping me in the email or telegram if you need help with this one; I'm fairly interested in getting this merged.

paskal avatar May 25 '21 21:05 paskal

I tried to reorganize code according to this comment https://github.com/umputun/remark42/pull/757#issuecomment-720160517

Please, look at the code to check if current understating of search.Service and search.Engine correspond to your sense.

Currently I removed all stuff connected with batching and also it doesn't perform indexing of existing documents on stratup, I'll add it later if current structure is ok.

Also, I removed elastic engine, not sure that it should be supported.

cc @umputun @paskal

vdimir avatar Jun 13 '21 18:06 vdimir

looks like you missed some vendor updates

umputun avatar Jun 13 '21 18:06 umputun

The vendor update is not here because I asked that initially. Otherwise, it's harder to review.

paskal avatar Jun 13 '21 18:06 paskal

From my side only plans to add buffering of new documents before indexing is left. And I'm not sure that it's required now.

vdimir avatar Jul 16 '21 13:07 vdimir

@vdimir, I propose to get it to the MVP state with as limited functionality as you think would be enough to start, then we'll merge it, and it will be possible to improve it and to build a frontend part for it, which is way more challenging when it's not in master.

Could you please finalize the PR to that state and create follow-up issues for frontend support of the search and the parts you didn't complete here so that they won't be lost but will be documented?

paskal avatar Oct 07 '21 05:10 paskal

@paskal great! I decided not to do buffering for now, so I'll only resolve conflicts and add dependencies into this PR to run the CI

vdimir avatar Oct 07 '21 08:10 vdimir

Pull Request Test Coverage Report for Build 1315680963

  • 240 of 367 (65.4%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.9%) to 82.779%

Changes Missing Coverage Covered Lines Changed/Added Lines %
backend/app/store/search/engine.go 10 11 90.91%
backend/app/store/search/startup_index.go 24 35 68.57%
backend/app/store/search/service.go 40 54 74.07%
backend/app/rest/api/rest_public.go 41 58 70.69%
backend/app/cmd/server.go 13 31 41.94%
backend/app/store/search/bleve_engine.go 106 137 77.37%
backend/app/store/search/search.go 0 35 0.0%
<!-- Total: 240 367
Totals Coverage Status
Change from base Build 1313236598: -0.9%
Covered Lines: 6047
Relevant Lines: 7305

💛 - Coveralls

coveralls avatar Oct 07 '21 10:10 coveralls

@paskal I've rebased on master into two commits, fixed goimports, added info into README

vdimir avatar Oct 10 '21 17:10 vdimir

LGTM with few minor things here and there, tests part for search.go could be ommited if too difficult as it's tested from another location.

@paskal thanks for taking a look, all comments are resolved

vdimir avatar Oct 16 '21 14:10 vdimir

@paskal @umputun I've resumed work on the feature (thanks @paskal for the reminder about this PR) and pushed the branch with the updated code. I tried to simplify it and added more comments about the high-level API (please point me to the places that are still not documented). May you please take a look?

Fixing the issues found by CI is in progress, but I suppose it would be minor changes.

vdimir avatar Sep 21 '22 21:09 vdimir

Sometimes, there's a check with the test, but it disappears, and only build / Build Docker images (pull_request) is shown, so not sure about the current state.

It should be two checks:

build

https://github.com/umputun/remark42/blob/363e05d58065939b647ad199deeab44f677a7213/.github/workflows/ci-build.yml#L17-L20

and test

https://github.com/umputun/remark42/blob/363e05d58065939b647ad199deeab44f677a7213/.github/workflows/ci-backend.yml#L12-L16

vdimir avatar Oct 02 '22 15:10 vdimir

For some reason, when I request a review from one participant, it removes the request form another one :)

vdimir avatar Nov 21 '22 17:11 vdimir