edx-platform icon indicating copy to clipboard operation
edx-platform copied to clipboard

feat: update search index when course content is updated [FC-0040]

Open rpenido opened this issue 1 year ago • 14 comments
trafficstars

Description

This PR implements updating the search index with content metadata whenever they change

More Infomation

  • Closes https://github.com/openedx/modular-learning/issues/196
  • Depends on:
    • https://github.com/openedx/edx-platform/pull/34310

Testing Instructions

  • Run your local stack on this branch
  • Make sure you have meilisearch setup locally, follow the setup instructions here https://github.com/open-craft/tutor-contrib-meilisearch
  • Make sure you also have some sample taxonomies/tags, you can add some from here: https://github.com/open-craft/taxonomy-sample-data
  • Run tutor dev run cms bash and ./manage.py cms reindex_studio
  • View the resulting index at http://meilisearch.local.edly.io:7700/ (see tutor-contrib-meilisearch README for how to get the API key to log in)
  • Go back to studio, and add/update XBlocks from a course
  • Go back to meilisearch dashboard, and confirm that the XBlocks docs have been updated to reflect the changes you made
  • Repeat and confirm the above for both units and blocks
  • Also, try to create a Content Library with some components and check if the index is updated accordingly.

NOTE: Meilisearch seems to cache queries along with their results in the frontend, so if you simply search it might show you stale data (network tab shows no requests), especially if you're searching for the same query. Make sure you refresh the Meilisearch dashboard (http://meilisearch.local.edly.io:7700/) and then perform the search.


Private ref: FAL-3690

rpenido avatar Mar 19 '24 18:03 rpenido

Thanks for the pull request, @rpenido! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

openedx-webhooks avatar Mar 19 '24 18:03 openedx-webhooks

I realized after reviewing that a lot of the changes on this PR diff are from @bradenmacdonald 's PR since this PR wasn't updated with upstream yet.

We have this temp PR (https://github.com/open-craft/edx-platform/pull/645). It could be easier to review before we get #34310 merged.

However in terms of testing, I'm not sure if its ready or not, but after briefly tried I was faced with some errors

Sorry for that @yusuf-musleh! I probably messed up something when I merged the changes from @bradenmacdonald.

I will let you know when this is ready for testing again (probably for a final review).

Also, I started writing some tests for the new API, but I'm not happy with them. As meilisearch is an external service, I need to mock many functions and only really test if the external API is called with the expected parameters. Maybe we should fire up a Meileisearch instance during the tests to actually test the feature (as we do with external services, like the database), but I'm not sure if this is the right time to do this (and I don't know how to do this in our CI suit either).

I plan to continue with the current approach (mocking and checking calls) for the API and the handlers to get some coverage, but I think we should revisit this if the prototype becomes the default search engine. Let me know if you have a better idea for this!

rpenido avatar Mar 21 '24 01:03 rpenido

As meilisearch is an external service, I need to mock many functions and only really test if the external API is called with the expected parameters.

@rpenido I think this approach is fine, since we can make the assumption that Meilisearch will behave as expected after we manually tested it. However I can see the value of having integration tests to actually test that it behaves as expected in case there are breaking/unexpected changes on the Meilisearch side. Although that could be avoided by using the same working version vs updating to the latest version every time.

I think we should revisit this if the prototype becomes the default search engine

I agree, I think this can come at a later stage. As you suggested, we can potentially follow the footsteps of the mysqldb when the time comes.

yusuf-musleh avatar Mar 21 '24 05:03 yusuf-musleh

@rpenido I agree that we should eventually run Meilisearch during tests. However, it's probably not necessary to implement that for this first studio search project which is considered experimental. Maybe you could add that to the ADR that I wrote? That for the experiment, we won't use Meilisearch during tests, but we expect to add that in the future if we move forward with replacing Elasticsearch completely.

bradenmacdonald avatar Mar 22 '24 15:03 bradenmacdonald

Hi @yusuf-musleh ! There is a test failing here, but I can't find the logs. I will look into it further tomorrow.

Edit: fixed 5a6beafc86276aafa9a02ca895f49b9d691a2905

rpenido avatar Mar 26 '24 22:03 rpenido

Thank you for the review @yusuf-musleh!

* When I create a new empty library, nothing gets added to the index, is that expected?

I think that it is expected. I commented on the first PR but I think it got lost in the review comments. @bradenmacdonald, could you confirm that we are not indexing the Library and the Course entities?

* When I add a component/block to the library that gets added to the index, as expected, when I edit the blocks, eg: the name or the problem type (for problem blocks) the index does not get updated. However If I change the name of the library, thats when the blocks under it get updated in the index with the latest information. It's probably just a missing condition on how/when the docs should be updated in the index.

This is because the LIBRARY_BLOCK_UPDATED event is not dispatched when we edit an library XBlock. Not sure if we will fix it now or later.

rpenido avatar Mar 28 '24 13:03 rpenido

@yusuf-musleh I want to rebase and squash this PR, but you will need to rebase it and merge conflicts again after that too (it will happen anyway when this goes upstream). Let me know when you think it is a good time to do it!

rpenido avatar Mar 28 '24 13:03 rpenido

@rpenido @yusuf-musleh We aren't indexing the courses/libraries themselves. So it's expected that a newly created library won't appear in the results.

bradenmacdonald avatar Mar 28 '24 18:03 bradenmacdonald

I want to rebase and squash this PR, but you will need to rebase it and merge conflicts again after that too (it will happen anyway when this goes upstream). Let me know when you think it is a good time to do it!

@rpenido Go for it! I'll handle the conflicts on my end, no worries.

yusuf-musleh avatar Mar 29 '24 10:03 yusuf-musleh

I want to rebase and squash this PR, but you will need to rebase it and merge conflicts again after that too (it will happen anyway when this goes upstream). Let me know when you think it is a good time to do it!

@rpenido Go for it! I'll handle the conflicts on my end, no worries.

Thank you @yusuf-musleh! Rebase done!

rpenido avatar Mar 29 '24 13:03 rpenido

Hi @bradenmacdonald ! This is ready for CC review

rpenido avatar Apr 03 '24 22:04 rpenido

Would you mind just quickly testing this again to make sure it's working?

@rpenido @bradenmacdonald I've tested it again, it's working well as expected. There's just the library content issue from before, where the LIBRARY_BLOCK_UPDATED is not firing when library blocks are updated (hence the index is not updated). However it seems like its fine for now and it'll be addressed in a separate ticket/issue?

yusuf-musleh avatar Apr 07 '24 14:04 yusuf-musleh

Thanks @yusuf-musleh. Yeah we don't need to worry too much about library blocks for now.

bradenmacdonald avatar Apr 08 '24 16:04 bradenmacdonald

This is ready to merge but we're waiting on some unrelated issues with the NodeJS 18 upgrade to be solved first.

bradenmacdonald avatar Apr 09 '24 17:04 bradenmacdonald

Hi @bradenmacdonald ! I added this commit (612f32ff6355fa48f5af2c8fae50503706b70dc1) here. There are some old libraries (before the recent module store refactor) that crash our reindex.

I deleted these libraries locally, and the code was working. In the sandbox, the error appeared again, so I think it would be better to skip the library if we found an error instead of crashing the reindex.

rpenido avatar Apr 16 '24 21:04 rpenido

Thanks @rpenido - that's a great fix.

bradenmacdonald avatar Apr 17 '24 17:04 bradenmacdonald

@rpenido I was just about to merge this, but there are some conflicts with the "permissions" PR that I just merged. Could you please fix them?

bradenmacdonald avatar Apr 17 '24 18:04 bradenmacdonald

@rpenido I was just about to merge this, but there are some conflicts with the "permissions" PR that I just merged. Could you please fix them?

Working on it. The code changed a lot, so I will need to manually test it to make sure I did the merge right.

rpenido avatar Apr 17 '24 20:04 rpenido

@rpenido Take your time - it's important to get it right :) Thanks.

bradenmacdonald avatar Apr 17 '24 20:04 bradenmacdonald

This is ready for review and merge @bradenmacdonald. It also includes the changes from your PR:

  • https://github.com/openedx/edx-platform/pull/34534#pullrequestreview-2007408802

It may have impacted your PR @yusuf-musleh.

rpenido avatar Apr 17 '24 22:04 rpenido

@rpenido 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

openedx-webhooks avatar Apr 18 '24 16:04 openedx-webhooks

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

edx-pipeline-bot avatar Apr 18 '24 18:04 edx-pipeline-bot

2U Release Notice: This PR has been deployed to the edX production environment.

edx-pipeline-bot avatar Apr 18 '24 18:04 edx-pipeline-bot

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

edx-pipeline-bot avatar Apr 18 '24 19:04 edx-pipeline-bot

2U Release Notice: This PR has been deployed to the edX production environment.

edx-pipeline-bot avatar Apr 18 '24 19:04 edx-pipeline-bot