factorio-prints icon indicating copy to clipboard operation
factorio-prints copied to clipboard

Fix bug in search.

Open FactorioBlueprints opened this issue 6 years ago • 17 comments

The database used by factorioprints is Firebase. Firebase allows a filter clause OR an order-by clause but not both.

The old strategy was to download ALL blueprint summaries, ordered, and do the filtering client side. This worked but got slower and consumed more bandwidth over time, until it got prohibitively expensive.

The current strategy is to order-by and paginate server side and filter client side. This makes it so that there are a different number of results per page, including some pages with zero results.

I’m working on migrating from Firebase to a relational database. But this is essentially a rewrite, and it would be good to explore other strategies.

FactorioBlueprints avatar Jan 30 '18 00:01 FactorioBlueprints

Can you please grab larger export of your DB as an example?

Do you have a place where we can plan the relational models?

perfectsine avatar Jun 23 '19 00:06 perfectsine

I just saw where you were willing to share a copy of the DB: https://github.com/FactorioBlueprints/factorio-prints/blob/master/CONTRIBUTING.md

I wouldn't mind that! Let me take at look at the application and pagination requirements. Moving to a relational database could have some price impacts.

perfectsine avatar Jun 23 '19 00:06 perfectsine

Since you considered nice using an App Engine interface for the images on GCS on the other issue, a solution to search is to deploy a super small service exposing App Engine's Search API as a REST API for querying.

Then on every save of the datastore sends the title and tags to the Search API. This could be done with Firebase Functions.

It is expected to be turn down at some moment in the not that near future since they made Search API unavailable to the new language versions and recommend moving to Elasticsearch on Compute Engine, since there is no expected time for that I think is safe but Elasticsearch is also a really great idea, there is a ready to use deployment on GCP Marketplace that automatically configures the VM.

Fryuni avatar Jun 24 '19 02:06 Fryuni

have you considered removing the order by when a person uses the search?

also where in the code are you issuing the query currently?

BrettMoan avatar Jul 09 '19 04:07 BrettMoan

have you considered removing the order by when a person uses the search?

@BrettMoan this would be a big change because I've always used order-by and never sent criteria to firebase.

also where in the code are you issuing the query currently?

Here's the paginated, ordered query. https://github.com/FactorioBlueprints/factorio-prints/blob/stable/src/sagas/subscribeSaga.js#L68

Here's the client side filtering. https://github.com/FactorioBlueprints/factorio-prints/blob/stable/src/selectors.js#L162-L165

FactorioBlueprints avatar Jul 10 '19 00:07 FactorioBlueprints

I did some digging to confirm, but most people for large projects would opt to using 3rd party tools like elastic search. I'm baffled by the fact that firebase doesn't provide even a .contains() function but only Ordering. This due to firebase not doing indexing on the text data necessary that occurs "out of the box" on a full blown rdbms, so that you could do things like the "LIKE" operator.

Since this is LIKELY a pet project for you, I also searched for "firebase search free" ;) this returned something promising. Namely the following article:

https://medium.com/@ken11zer01/firebase-firestore-text-search-and-pagination-91a0df8131ef

Basically, for each description you're building your OWN index by splitting the strings and then storing an array of substrings. Then your checking to see if that array "contains" your key (the search).

BrettMoan avatar Jul 10 '19 04:07 BrettMoan

I don't know how large your data set is, but while the approach in the medium article may not work for the full description, it might work for enabling searching by tags? That would be a smaller subset of data.

Otherwise you may indeed need to wait until you can port to an rdbms.

BrettMoan avatar Jul 10 '19 04:07 BrettMoan

@FactorioBlueprints Would it be possible to continue pulling pages of ordered results and adding those to the existing collection for this "page" until your filtered result collection was equal to your pageSize OR until the numberOfPages is equal to the page number (we're at the end of the results)?

I'm doing some reading on Redux but I don't quite know where to insert the logic that would fetch yet more results and append those to the current collection. Seems like it should be somewhere after line 150 in https://github.com/FactorioBlueprints/factorio-prints/blob/stable/src/selectors.js#L150

Fusty avatar Aug 03 '19 18:08 Fusty

Relational databases are good at storing relations, and the ability to perform full text searches on fields is really just one feature. They're not really optimized for that sort of thing, so even though it might be possible to migrate to an RDBMS to accomplish what you want, it's not really an ideal solution - you might find yourself needing to optimize the performance of full text search.

The tool you really need is one to create and maintain a search index - a map of search terms to records (some RDBMS do behind the scenes when you use full text searches, but it's not their specialty). Some popular tools for this are Solr and Elastic search. These days there are lots of SaaS providers offering free services if you are below a certain threshold of usage - you might do a little digging to see if you can find one. You might even consider some free platform services like Google Cloud Compute's free tier. Or you could throw some ads on factorio blueprints and use ad revenue to pay for Algolia.

Ultimately you need these things:

  • Something to derive search terms from individual blueprint records
  • A way to map individual search term to a list of blueprint records containing these terms
  • A way of querying this collection of search terms using boolean logic, search term precedence, pagination, ordering, etc.
  • A way to get full blueprint records after you've identified the records matching search terms

One thing to consider is products like Solr and Elasticsearch have out-of-the-box solutions to this; however, your use case is somewhat simple. You might be able to just pre-compute search terms as part of uploading a blueprint and use some new Firestore collection as a search index you manage yourself.

@BrettMoan's suggestion to index the search terms yourself seems like a pragmatic solution. There are two approaches you can take if you do it yourself:

  • Create and manage a separate Firestore collection that helps you map search terms to blueprint records – a dictionary of individual search terms to an array of blueprint document ID's containing the term
  • Augment each existing blueprint records with a set of searchable terms that can be easily queried - definitely confirm you can write a Firestore query to get what you need before going down this path.

johntron avatar Aug 22 '19 03:08 johntron

I have the REST rewrite largely working here: https://www.factorio.school/

It's read-only and I'll sync the Firebase database to the relational database periodically to keep it relatively up-to-date. Could you folks take a look and see if it works ok before I share it more widely?

FactorioBlueprints avatar Aug 22 '19 12:08 FactorioBlueprints

I have the REST rewrite largely working here: https://www.factorio.school/

It's read-only and I'll sync the Firebase database to the relational database periodically to keep it relatively up-to-date. Could you folks take a look and see if it works ok before I share it more widely?

@FactorioBlueprints it seemed to be working yesterday but I just opened the page and I got an error:

Application error An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details. You can do this from the Heroku CLI with the command heroku logs --tail

asdkant avatar Aug 23 '19 13:08 asdkant

@asdkant Sorry about that, fixed now.

FactorioBlueprints avatar Aug 24 '19 20:08 FactorioBlueprints

I just saw the new work done with a REST API. I wasn't aware that this was being developed and created my own pet project this past 2 months => https://www.fuelforfactorio.com I mainly added more search options (entities contained / recipes produced) and replaced image hosting by a direct client-side renderer based on https://github.com/Teoxoy/factorio-blueprint-editor. I also let the API endpoints publicly available.

Even with 2 frontend, maybe it would be possible to maintain a single backend ?

I will shortly open the repository (after a bit of cleanup/doc).

boobin avatar Aug 29 '19 21:08 boobin

Wow @boobin looks great! I'd like to integrate Teoxoy/factorio-blueprint-editor into factorio.school as well. Besides the broken stuff, the most common complaint is needing to upload a screenshot. I haven't had time to even investigate whether it was possible because I've been so focused on the backend. I think there's room for more than one UI and to collaborate on UI work.

FactorioBlueprints avatar Aug 29 '19 23:08 FactorioBlueprints

Thx! I'm not so concerned about UI right now, i'm really trying to get a stable and performant API, with the specs i want for the UI:

  • search by text, tags, entities and recipes
  • provide a summary of entities and recipes on the list page
  • display tags and like count on the list page
  • allow to tag with existing or new tags easily

I would understand if you refuse, but is there any chance to get even a partial dump of your blueprint table ? I consider generating random bluprints to test the services against a reasonably loaded database, but real data would be awesome.

In term of technology, my backend is a Rust stack (actix-web + diesel) with a PostgreSQL database.

boobin avatar Aug 30 '19 12:08 boobin

@boobin Funny to see you post this here, Sadly it seems your site link is dead, are you still working on this project? Because just this week I started doing the same thing I thought maybe you'd be open to discuss your progress so far. Please let me know here or reach out on discord if you use that Barry#7827

barthuijgen avatar Oct 18 '20 07:10 barthuijgen

@barthuijgen I put the site down a few months ago at the end of AWS free trial period, there was no traffic anymore. It was functionnal but i lost interest when i realized i was no longer using public blueprint databases myself to play.

The code is public on https://gitlab.com/lbobinet/fuelforfactorio.

boobin avatar Oct 18 '20 08:10 boobin