jikan-rest icon indicating copy to clipboard operation
jikan-rest copied to clipboard

Improved v4 search via Laravel Scout

Open pushrbx opened this issue 2 years ago • 3 comments

Fixes #189 This PR will add support for search indexing via laravel scout using typesense or elasticsearch. Using laravel scout and a search index the Jikan API will be able to provide more accurate search results, more similar to the ones provided on the myanimelist website.

TODOs:

  • Add tests

Known issues

The search results seem OK with TypeSense, they are a bit similar to the search results of MAL. With TypeSense the results are sorted by popularity in ascending order AND by the text match score in descending order.
Example anime_index search: suisei

TypeSense Parameters

Parameter name Parameter value
q suisei
query_by title,title_english,title_japanese
per_page 25
page 1
highlight_start_tag %3Cmark%3E
highlight_end_tag %3C%2Fmark%3E
sort_by popularity:asc,_text_match:desc
query_by_weights 1,2,1
exhaustive_search true

Search results

TypeSense MAL
Suisei no Gargantia Suisei no Gargantia
Tensai Ouji no Akaji Kokka Saisei Jutsu Suisei
Suisei no Gargantia: Meguru Kouro, Haruka Muumindani no Suisei
Suisei no Gargantia Specials Suisei no Gargantia: Meguru Kouro, Haruka
Yoku Wakaru Mahouka! - Saiseitte Nani? Suisei no Gargantia Specials

From these results it seems like TypeSense tries to show results if there were some typos in the search query, which we might want to disable. However at the moment I'm not sure how this should be done. This behavior can be disabled if we set exhaustive_search parameter to false for TypeSense.


New framework

For translating query string parameters to ORM commands I've added a new framework. The search query builder instances are resolved through the SearchQueryProvider instance via a string key. So instead of static functions in classes for doing a query, there will be a singleton instance from the IOC container.

This resulted in more DRY code and some more simplicity in the controllers. For example in the SearchController the action methods become more simpler:

    public function clubs(Request $request)
    {
        return $this->preparePaginatedResponse(ClubCollection::class, "club", $request);
    }

Testing this yourself

Setup typesense:

docker run -d --name typesense-dev --restart no -p 8108:8108 -v typesense_data:/data typesense/typesense:0.22.2 --data-dir /data --api-key "yourapikeyhere"

Update the jikan-rest env file with the following:

###
# Scout config
###
SCOUT_DRIVER=typesense
SCOUT_QUEUE=false

###
# TypeSense Config
###
TYPESENSE_HOST=localhost
TYPESENSE_PORT=8108
TYPESENSE_API_KEY="yourapikeyhere"

Run the indexer task for a while for anime/manga:

php artisan indexer:anime
php artisan indexer:manga

Theoritically it should automatically index records in TypeSense, but to make sure they are there run the following:

php artisan scout:import "App\Anime"
php artisan scout:import "App\Manga"

Manually verifying typesense

You can check the data in typesense by just sending http requests to it with postman/curl.

List all indexes
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
    "http://localhost:8108/collections"
Check fields of the index
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
     -X GET \
    "http://localhost:8108/collections/anime_index"
Search in the index
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
     -X GET \
    "http://localhost:8108/collections/manga_index/documents/search?q=${SEARCH_QUERY}&query_by=title%2Ctitle_english%2Ctitle_japanese&per_page=25&page=1&highlight_start_tag=%3Cmark%3E&highlight_end_tag=%3C%2Fmark%3E&query_by_weights=1%2C2%2C1&sort_by=popularity:asc%2C_text_match%3Adesc"

In this case make sure that you set the query_by parameter to the following value: title%2Ctitle_english%2Ctitle_japanese for anime and manga. The value of this field should match with the return values of the typesenseQueryBy() function of each Jikan model. query_by_weights sets the weights of the fields specified in query_by parameter. We want the title_english to be ranked lower. sort_by is set on popularity and text match score, this way we make the search results similar to the ones on MAL.

TypeSense API docs: https://typesense.org/docs/0.22.0/api/documents.html#search

pushrbx avatar May 30 '22 17:05 pushrbx

I'm going to address the following things in the coming days:

  • anime type filter doesn't work correctly: http://staging.jikan.moe/v4/anime?q=fate&type=movie
  • pagination is broken

pushrbx avatar Jun 28 '22 18:06 pushrbx

TypeSense Parameters Updated

Parameter name Parameter value
sort_by _text_match:desc, members:desc
query title, title_transformed, title_english, title_english_transformed, title_japanese, title_japanese_transformed
query_by_weights 2,2,1,1,2,2
  1. *_transformed values are simplified versions of titles that remove any character that are not alpha-numeric. This helps search against edge cases like "Fate/Zero".
  2. Removed title_synonyms from because it was messing up search accuracy.

We can deal with flattening and using titles later which should bring support for more languages. I guess the best way for that would be to include them as separate keys in the searchable array. e.g title_german_transformed.

irfan-dahir avatar Oct 03 '22 18:10 irfan-dahir

Manual test plan:

  • [x] Should return good search results for anime, manga, character, magazine, producers, person
  • [x] Producers endpoints should respond correctly
  • [x] Filters should work properly for all endpoints. (order_by, limit, sort, and others)

pushrbx avatar Oct 03 '22 18:10 pushrbx

I've tested it again manually one by one, and I've found no more issues. 🥳 🎉 The next iteration should be about refactoring the rest of the endpoints to the new framework, and adding more tests.

pushrbx avatar Oct 20 '22 18:10 pushrbx

Is this actually done? What an auspicious day 😇🎉 When will it be pushed to the live REST API? Can't wait to try it out :)

Prid13 avatar Oct 23 '22 17:10 Prid13

I want to use this!

Wamy-Dev avatar Oct 24 '22 17:10 Wamy-Dev

Any update on when this will go live? I'm eagerly waiting for this 😇

Prid13 avatar Dec 31 '22 16:12 Prid13

@Prid13 No ETA, I'm actively working on this.

pushrbx avatar Dec 31 '22 18:12 pushrbx

@pushrbx Any update on this? 😇 Sorry for asking so much -- I really appreciate all the effort ⭐

Prid13 avatar Jun 02 '23 03:06 Prid13

@pushrbx Any update on this? 😇 Sorry for asking so much -- I really appreciate all the effort ⭐

You can see the progress under a different PR: https://github.com/jikan-me/jikan-rest/pull/346

pushrbx avatar Jun 02 '23 17:06 pushrbx