elasticstore
elasticstore copied to clipboard
A pluggable union between Firebase CloudFirestore + ElasticSearch
Elasticsearch + CloudFirestore = Elasticstore
NEW: See elasticstore fulfilling requests using firebase. [repo]
A pluggable integration with ElasticSearch to provide advanced content searches in Firestore.
This script can:
- monitor multiple firestore
collectionsandsubcollectionsand add/modify/remove indexed elasticsearch data in real time transform,filter,include,excludeandmappingfunctionality for each document- communicates with client completely via Firebase (no elasticsearch client required, though a query builder is recommended)
- clean up old requests
Heavily Inspired by the Realtime Database implementation (Flashlight) by the Firebase Team
NOTE
For large firebase datasets, particularly when initially starting the script, a queuing setup has been put in place to prevent elasticsearch from responding with 429 requests. The provided QUEUE_CONCURRENT and QUEUE_DELAY config options are in place to mitigate that issue. Please adjust them according to your hardware.
Getting Started:
- Install and run Elasticsearch either locally or with a service
git clone https://github.com/acupofjose/elasticstorenpm install- Supply
.envwith variables OR define them in your environment (see supplied.env.sample) - Edit
src/references.tsto include your configuration (see below) npm run start
Documentation:
How do I define a reference?
| Option | Type | Parameters | Return | Description |
|---|---|---|---|---|
collection |
string |
n/a | n/a | Represents a single collection in firestore |
subcollection |
string |
n/a | n/a | Represents a single subcollection of a document in firestore |
index |
string or function |
snap:Firestore.DocumentSnapshot, parentSnap:Firestore.DocumentSnapshot |
string |
Used by elasticsearch, index records will be placed under |
mappings |
object |
n/a | n/a | Used by elasticsearch, this should be an object containing {fieldName: {type: ELASTICSEARCH_FIELD_TYPE}} |
include |
Array<string> |
n/a | string[] |
Fields from firestore to be included in records passed to elasticsearch` |
exclude |
Array<string> |
n/a | string[] |
Fields from firestore to be excluded in records passed to elasticsearch` |
builder |
function |
Firestore.CollectionReference |
query | Builds the collection query that firestore will bind to and insert records to elasticsearch from |
subBuilder |
function |
Firestore.CollectionReference |
query | Builds the subcollection query that firestore will bind to and insert records to elasticsearch from |
filter |
function |
Firestore.DocumentData |
boolean | Run on an individual firestore record, if it returns false, the record will not be inserted |
transform |
function |
data: {[key: string]: any}, parentSnap:Firestore.DocumentSnapshot |
object | Transform data recieved from firestore to an object passed along to elasticsearch (run after filtering) |
onItemUpserted |
function or Promise |
data: {[key: string]: any}, parentSnap:Firestore.DocumentSnapshot, client: Elasticsearch.Client |
void | Callback after an item has been upserted to Elasticsearch |
So for instance, maybe I want to index a collection called groups that does a tranformation on the data received from firestore, and maps a firestore geopoint to an elasticsearch geo_point
// firestore (in the console)
groups: {
12341235: {
title: "Group Name", // string
description: "I'm a group", // string
location: "32,-74" // geo_point
createdAt: "9/4/2018 00:00:00 GMT-0" // date
}
}
// references (in ./src/references.ts)
{
....
{
collection: "groups",
index: "groups",
mappings: {
location: {
type: "geo_point" // elasticsearch's definition of a geopoint
}
},
transform: (data, parent) => ({
...data,
location: `${data.location._latitude},${data.location._longitude}` // transform from firestore's geopoint to elasticsearch's
})
},
....
}
Subcollections
When this repo was first made, Firestore had some limitations on how collections worked. You could have a root collection and every root Collection could have a Subcollection. Subcollections could not not have Subcollections.
This repo supports the use of subcollections with the note that this is expensive. Why? Because when Firestore returns a Collection Query it does not return the Subcollection's data, meaning, to listen to changes on a Subcollection requires a listener per Subcollection instance.
Have a need for a Subcollection from 400 Collection instances? That's 400 listeners. AFAIK there's not a better way to do this given Firebase's current API.
So with those caveats, the repo allows you to specify some ways to narrow how many listeners you register by narrowing your queries.
For Example
// Firestore Data Model
{
// Collection
"users" : {
"UUID-1" : {
email: "[email protected]",
isPremium: true,
// Subcollection
"profile": {
"firstName": "Romeo",
"public": false
....
}
}
"UUID-2" : {
email: "[email protected]",
isPremium: false,
// Subcollection
"profile": {
"firstName": "Juliet"
"public": true,
...
}
}
}
To listen to the profile Subcollection on user where the profile is public , you'd create a reference like this:
{
collection: "users",
subcollection: "profile",
index: "user-profiles",
subBuilder: (ref) => ref.where('public', '==', true)
}
Or only profiles where the user isPremium and public (note that the index and type are changed, but that the change is arbitrary):
{
collection: "users",
subcollection: "profile",
index: "user-premium-profiles",
builder: (ref) => ref.where('isPremium', '==', false),
subBuilder: (ref) => ref.where('public', '==', true)
}
Making Searches (Client Side)
Elasticstore will listen to the search/ root collection for a new document containing a request object key and a null response object key. Upon finding a request that is 'unfulfilled' (a null response).
* Note that these keys are defined in the .env file
Requests should be formed as new documents in the search collection.
Assuming you're using the node.js Firebase SDK, making an Elasticsearch request through Firebase would look something like this:
const result = await firebase
.firestore()
.collection("search")
.add({
request: {
index: "users",
q: "John", // Shorthand query syntax
},
response: null,
})
result.ref.onSnapshot((doc) => {
if (doc.response !== null) {
// Do things
}
})
Or with the normally expected Elasticsearch syntax body:
const result = await firebase.firestore().collection('search').add({
request: {
index: 'users',
body: {
query: {
match: {
"_all": "John"
}
}
},
response: null
})
result.ref.onSnapshot(doc => {
if (doc.response !== null) {
// Do things
}
})
Restrictions / Caveats
Be aware that on large collections, this will need some tuning. Upon starting (and restarting) ALL data is re-indexed unless you choose to filter it yourself. This is a VERY expensive operation, as you will have to perform reads on every document you have in your collection.
When dealing with subcollections, a listener is added for each collection which then adds a listener to the specified subcollection. If you don't filter these, you may end up with a large number of listeners for data that doesn't get changed very often.
Contributors ✨
Thanks goes to these wonderful people (emoji key):
Ruslan Petrov 💻 |
StoryStar 🐛 |
Y 🐛 |
Teju Nareddy 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!