docs-scraper Tests meilisearch implementation

These tests only tries communication with MeiliSearch and not the pertinence of the scraper:

Tests should be made to test if MeiliSearch implementations works correctly.

In meilisearch_helper.py the following is done:

Delete the scrape index if it already exists
Create a new index with the same name
Add default and custom settings to index

These should be tested if it was done successfully. You can confirm it worked correctly using the GET /indexes method.

A test directory should be created meilisearch_***.

In that directory the different tests should be made

A simple meilisearch configuration with the right credentials and no setting (#154)
- Check if index was correctly added to Meilisearch
- Check if default setting were added correctly
A simple meilisearch configuration with the right credentials and settings
- Check if index was correctly added to Meilisearch
- Check if default setting were added correctly
A simple meilisearch configuration with the right credentials and bad settings
- Check if index was correctly added to Meilisearch
- Check if error is raised

To start this tests, their should be a running instance of Meilisearch.

.github/workflows/test-lint.yml

      - name: Docker setup
        run: docker run -d -p 7700:7700 getmeili/meilisearch:latest ./meilisearch --no-analytics=true --master-key='masterKey'
      - name: Run tests
        run: pipenv run pytest ./scraper/src -k "not _browser"

Jun 11 '20 12:06 bidoubiwa

The #77 "partially" fixed the issue. It does not provide any detailed check, but launch the scraper against a MeiliSearch instance, and at least, it checks there is no crash when using it.

Nov 04 '20 16:11 curquiza

I was looking into this and I'm not sure if I don't fully understand or if there is an issue. For this one:

A simple meilisearch configuration with the right credentials and no setting

Check if index was correctly added to Meilisearch Check if default setting were added correctly

if I use the following config

{
  "index_uid": "docs",
  "start_urls": ["https://docs.meilisearch.com"]
}

Then I get the error TypeError: argument of type 'NoneType' is not iterable that happens here. I am thinking the json I am using is not what you have in mind? The thing that makes me question that and wonder if there could be an issue is the parser is called from ConfigLoader and in that Class selectors gets initialized to None. This makes me think calling the parser with selectors set to None shouldn't throw an error? Should I instead be using the basic config json file?

I also have a couple more questions. First, these tests by default are going to make network calls and actually scrape the MeiliSearch docs site. Is that what you want or do you want to look into mocking the network calls? If the actual network calls are made it would probably be a good idea to mark these tests as network tests so they could be skipped when running pytest in case for some reason someone doesn't have a network connection at the time and still needs to run the tests. I'm thinking marking them like the proposal to mark the chromedriver tests in #135

Second, what do you have in mind for the check that the settings were added correctly? They are scraped from the MeiliSearch docs, but since those are live docs they will change over time so the tests could start failing because the docs change and not because the scraper isn't working properly.

Jun 05 '21 20:06 sanders41

@sanders41 Concerning your first question I think it is important that we determine which fields are mandatory, see #154

Concerning your second question, I was wondering if we could either provide a mocked website or run the documentation locally on the CI and scrape on it. I agree that no network calls should be made during CI. Nontheless these tests are not linked to the one suggested tests in this issue.

Second, what do you have in mind for the check that the settings were added correctly? They are scraped from the MeiliSearch docs, but since those are live docs they will change over time so the tests could start failing because the docs change and not because the scraper isn't working properly.

We can just make a call back to MeiliSearch to see if the settings were added using GET settings route.

Sep 29 '21 08:09 bidoubiwa

I confirm, we should check the settings have been well imported, see https://github.com/meilisearch/docs-scraper/pull/158 🙂

Oct 04 '21 17:10 curquiza

As this repo is now low-maintenance, this PR is no longer very interesting today. I'm closing it

Sep 06 '23 10:09 alallema