jikan-rest
jikan-rest copied to clipboard
Improved v4 search via Laravel Scout
Fixes #189 This PR will add support for search indexing via laravel scout using typesense or elasticsearch. Using laravel scout and a search index the Jikan API will be able to provide more accurate search results, more similar to the ones provided on the myanimelist website.
TODOs:
- Add tests
Known issues
The search results seem OK with TypeSense, they are a bit similar to the search results of MAL. With TypeSense the results are sorted by popularity in ascending order AND by the text match score in descending order.
Example anime_index
search: suisei
TypeSense Parameters
Parameter name | Parameter value |
---|---|
q | suisei |
query_by | title,title_english,title_japanese |
per_page | 25 |
page | 1 |
highlight_start_tag | %3Cmark%3E |
highlight_end_tag | %3C%2Fmark%3E |
sort_by | popularity:asc,_text_match:desc |
query_by_weights | 1,2,1 |
exhaustive_search | true |
Search results
From these results it seems like TypeSense tries to show results if there were some typos in the search query, which we might want to disable. However at the moment I'm not sure how this should be done. This behavior can be disabled if we set exhaustive_search
parameter to false
for TypeSense.
New framework
For translating query string parameters to ORM commands I've added a new framework. The search query builder instances are resolved through the SearchQueryProvider
instance via a string key. So instead of static functions in classes for doing a query, there will be a singleton instance from the IOC container.
This resulted in more DRY code and some more simplicity in the controllers. For example in the SearchController
the action methods become more simpler:
public function clubs(Request $request)
{
return $this->preparePaginatedResponse(ClubCollection::class, "club", $request);
}
Testing this yourself
Setup typesense:
docker run -d --name typesense-dev --restart no -p 8108:8108 -v typesense_data:/data typesense/typesense:0.22.2 --data-dir /data --api-key "yourapikeyhere"
Update the jikan-rest env file with the following:
###
# Scout config
###
SCOUT_DRIVER=typesense
SCOUT_QUEUE=false
###
# TypeSense Config
###
TYPESENSE_HOST=localhost
TYPESENSE_PORT=8108
TYPESENSE_API_KEY="yourapikeyhere"
Run the indexer task for a while for anime/manga:
php artisan indexer:anime
php artisan indexer:manga
Theoritically it should automatically index records in TypeSense, but to make sure they are there run the following:
php artisan scout:import "App\Anime"
php artisan scout:import "App\Manga"
Manually verifying typesense
You can check the data in typesense by just sending http requests to it with postman/curl.
List all indexes
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
"http://localhost:8108/collections"
Check fields of the index
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-X GET \
"http://localhost:8108/collections/anime_index"
Search in the index
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-X GET \
"http://localhost:8108/collections/manga_index/documents/search?q=${SEARCH_QUERY}&query_by=title%2Ctitle_english%2Ctitle_japanese&per_page=25&page=1&highlight_start_tag=%3Cmark%3E&highlight_end_tag=%3C%2Fmark%3E&query_by_weights=1%2C2%2C1&sort_by=popularity:asc%2C_text_match%3Adesc"
In this case make sure that you set the query_by
parameter to the following value: title%2Ctitle_english%2Ctitle_japanese
for anime and manga. The value of this field should match with the return values of the typesenseQueryBy()
function of each Jikan model. query_by_weights
sets the weights of the fields specified in query_by
parameter. We want the title_english
to be ranked lower. sort_by
is set on popularity
and text match score, this way we make the search results similar to the ones on MAL.
TypeSense API docs: https://typesense.org/docs/0.22.0/api/documents.html#search
I'm going to address the following things in the coming days:
- anime type filter doesn't work correctly: http://staging.jikan.moe/v4/anime?q=fate&type=movie
- pagination is broken
TypeSense Parameters Updated
Parameter name | Parameter value |
---|---|
sort_by | _text_match:desc , members:desc |
query | title , title_transformed , title_english , title_english_transformed , title_japanese , title_japanese_transformed |
query_by_weights | 2,2,1,1,2,2 |
-
*_transformed
values are simplified versions of titles that remove any character that are not alpha-numeric. This helps search against edge cases like "Fate/Zero". - Removed
title_synonyms
from because it was messing up search accuracy.
We can deal with flattening and using titles
later which should bring support for more languages. I guess the best way for that would be to include them as separate keys in the searchable array. e.g title_german_transformed
.
Manual test plan:
- [x] Should return good search results for anime, manga, character, magazine, producers, person
- [x] Producers endpoints should respond correctly
- [x] Filters should work properly for all endpoints. (order_by, limit, sort, and others)
I've tested it again manually one by one, and I've found no more issues. 🥳 🎉 The next iteration should be about refactoring the rest of the endpoints to the new framework, and adding more tests.
Is this actually done? What an auspicious day 😇🎉 When will it be pushed to the live REST API? Can't wait to try it out :)
I want to use this!
Any update on when this will go live? I'm eagerly waiting for this 😇
@Prid13 No ETA, I'm actively working on this.
@pushrbx Any update on this? 😇 Sorry for asking so much -- I really appreciate all the effort ⭐
@pushrbx Any update on this? 😇 Sorry for asking so much -- I really appreciate all the effort ⭐
You can see the progress under a different PR: https://github.com/jikan-me/jikan-rest/pull/346