GeoFlutterFire icon indicating copy to clipboard operation
GeoFlutterFire copied to clipboard

Document reads

Open manafire opened this issue 4 years ago • 19 comments

One thing I'm not clear on from the documentation, is GeoFlutterFire incurring a read for each document in the entire collection to return points within a radius or is it leveraging indexing on the geohash to only incur document reads on a collection subset? It seems like the the latter, but I'm just not 100%. Thanks!

manafire avatar Oct 10 '20 15:10 manafire

It seems to me like the latter too (i.e. so the reads are only from 9 geohashes) but would be great to have someone confirm this as well!

seranotannason avatar Nov 22 '20 07:11 seranotannason

It reads the entire collection. If you have 10,000 documents in the collection on which you are querying, then all the documents would be read. So it's quite inefficient as you would be billed for each document read i.e, 10,000 reads and that too for each time you query firestore.

And this would result in the client side geo-data processing within the given radius, an extra overhead involved.

In order to avoid this, you can pass a firestore query reference to the geo query like below and requires it to be indexed:

var queryRef = _firestore.collection('locations').where('city', isEqualTo: 'bangalore');

var stream = geo
               .collection(collectionRef: queryRef)
               .within(center: center, radius: rad, field: 'position');

But let's say that in India you have 10,000 documents and in bangalore the city which is mentioned in queryRef has 2,000 documents. Even now you would be reading complete 2,000 documents, which means you would be billed for 2,000 document reads each time you query.

Now, decide on whether to use this library or not! but highly appreciate the developers of this library for their contribution.

karthikreddi avatar Dec 17 '20 14:12 karthikreddi

This is indeed very limiting and does not scale at all unfortunately. Anyone have a good scalable solution for this in Firestore?

samu-developments avatar Jan 05 '21 08:01 samu-developments

is there a way to limit the number of documents read by geohash values? i.e. if I know for a fact that my user only cares about locations within 4km of them, I can further limit my reads to only the documents where the first 5 characters of geohash values (which correspond to an area of about 4.8x4.8km) match those of my user. I am not super familiar with firebase billing, to know whether this would reduce the cost. Please feel free to comment

yulkin2002 avatar Jan 08 '21 23:01 yulkin2002

@yulkin2002 This would limit it yes, but any location outside of the immediate geohash square would be missed. Say you are in the upper left corner of the geohash, then many locations could be in another totally different geohash (eg. the lower right of the other).

samu-developments avatar Jan 09 '21 09:01 samu-developments

@oyvindsam ah yes, good point. May be possible to leverage the proximity_hash package to identify the nearest squares within a given radius.

yulkin2002 avatar Jan 09 '21 17:01 yulkin2002

i think working with geo data especially in connection with realtime updates, cloud firestore is not the solution since it does not scale very well (i mean from the price point of view). if you dont want to burn money, you better go with the Firebase Realtime Database. However, this library wont work then and I dont know of any other library existing that would work with the Realtime DB. If you do know, or hear about it one day, please tag me.

lensbreak avatar Jan 27 '21 12:01 lensbreak

It reads the entire collection. If you have 10,000 documents in the collection on which you are querying, then all the documents would be read. So it's quite inefficient as you would be billed for each document read i.e, 10,000 reads and that too for each time you query firestore.

And this would result in the client side geo-data processing within the given radius, an extra overhead involved.

In order to avoid this, you can pass a firestore query reference to the geo query like below and requires it to be indexed:

var queryRef = _firestore.collection('locations').where('city', isEqualTo: 'bangalore');

var stream = geo
               .collection(collectionRef: queryRef)
               .within(center: center, radius: rad, field: 'position');

But let's say that in India you have 10,000 documents and in bangalore the city which is mentioned in queryRef has 2,000 documents. Even now you would be reading complete 2,000 documents, which means you would be billed for 2,000 document reads each time you query.

Now, decide on whether to use this library or not! but highly appreciate the developers of this library for their contribution.

is this still valid?

t-kietzmann avatar Feb 06 '21 18:02 t-kietzmann

It reads the entire collection. If you have 10,000 documents in the collection on which you are querying, then all the documents would be read. So it's quite inefficient as you would be billed for each document read i.e, 10,000 reads and that too for each time you query firestore.

And this would result in the client side geo-data processing within the given radius, an extra overhead involved.

In order to avoid this, you can pass a firestore query reference to the geo query like below and requires it to be indexed:

var queryRef = _firestore.collection('locations').where('city', isEqualTo: 'bangalore');

var stream = geo
               .collection(collectionRef: queryRef)
               .within(center: center, radius: rad, field: 'position');

But let's say that in India you have 10,000 documents and in bangalore the city which is mentioned in queryRef has 2,000 documents. Even now you would be reading complete 2,000 documents, which means you would be billed for 2,000 document reads each time you query.

Now, decide on whether to use this library or not! but highly appreciate the developers of this library for their contribution.

This response really made me to think to check with the the author of this plugin,

Reason is this plugin mainly focus on getting the nearby radius with respect to geohashes, so they should be already taking care of pulling only the documents which matches the nearby Hashcode.

@author, please do confirm once from your end,

deepaknssd avatar Feb 18 '21 15:02 deepaknssd

I'm interested to know this too

pumuckelo avatar Apr 08 '21 18:04 pumuckelo

It reads the entire collection. If you have 10,000 documents in the collection on which you are querying, then all the documents would be read. So it's quite inefficient as you would be billed for each document read i.e, 10,000 reads and that too for each time you query firestore.

And this would result in the client side geo-data processing within the given radius, an extra overhead involved.

In order to avoid this, you can pass a firestore query reference to the geo query like below and requires it to be indexed:

var queryRef = _firestore.collection('locations').where('city', isEqualTo: 'bangalore');

var stream = geo
               .collection(collectionRef: queryRef)
               .within(center: center, radius: rad, field: 'position');

But let's say that in India you have 10,000 documents and in bangalore the city which is mentioned in queryRef has 2,000 documents. Even now you would be reading complete 2,000 documents, which means you would be billed for 2,000 document reads each time you query.

Now, decide on whether to use this library or not! but highly appreciate the developers of this library for their contribution.

This is not true. I just checked the code.

Query _queryPoint(String geoHash, String field) {
    final end = '$geoHash~';
    final temp = _collectionReference;
    return temp.orderBy('$field.geohash').startAt([geoHash]).endAt([end]);
  }

This is the actual firestore query. The first parameter is the geohash of the approximate area from the center. So all the documents are ordered by their geohash field alphabetically (this is a free operation) and then ONLY the ones that start with the area geohash will be queried and read.

Example:

area geohash (first parameter): "GBS" all docs ordered by their geohash field:

..
...
FFG72DA12
GA2411231
GBS241241         <--------
GBS446322         <--------
HA92KA122
...
..

Only these two documents of this collection would be read.

sentd94 avatar Apr 19 '21 22:04 sentd94

It reads the entire collection. If you have 10,000 documents in the collection on which you are querying, then all the documents would be read. So it's quite inefficient as you would be billed for each document read i.e, 10,000 reads and that too for each time you query firestore. And this would result in the client side geo-data processing within the given radius, an extra overhead involved. In order to avoid this, you can pass a firestore query reference to the geo query like below and requires it to be indexed:

var queryRef = _firestore.collection('locations').where('city', isEqualTo: 'bangalore');

var stream = geo
               .collection(collectionRef: queryRef)
               .within(center: center, radius: rad, field: 'position');

But let's say that in India you have 10,000 documents and in bangalore the city which is mentioned in queryRef has 2,000 documents. Even now you would be reading complete 2,000 documents, which means you would be billed for 2,000 document reads each time you query. Now, decide on whether to use this library or not! but highly appreciate the developers of this library for their contribution.

This is not true. I just checked the code.

Query _queryPoint(String geoHash, String field) {
    final end = '$geoHash~';
    final temp = _collectionReference;
    return temp.orderBy('$field.geohash').startAt([geoHash]).endAt([end]);
  }

This is the actual firestore query. The first parameter is the geohash of the approximate area from the center. So all the documents are ordered by their geohash field alphabetically (this is a free operation) and then ONLY the ones that start with the area geohash will be queried and read.

Example:

area geohash (first parameter): "GBS" all docs ordered by their geohash field:

..
...
FFG72DA12
GA2411231
GBS241241         <--------
GBS446322         <--------
HA92KA122
...
..

Only these two documents of this collection would be read.

Thank you so much for your explanation! I am really glad after reading your reply.

deepaknssd avatar Apr 20 '21 01:04 deepaknssd

It reads the entire collection. If you have 10,000 documents in the collection on which you are querying, then all the documents would be read. So it's quite inefficient as you would be billed for each document read i.e, 10,000 reads and that too for each time you query firestore. And this would result in the client side geo-data processing within the given radius, an extra overhead involved. In order to avoid this, you can pass a firestore query reference to the geo query like below and requires it to be indexed:

var queryRef = _firestore.collection('locations').where('city', isEqualTo: 'bangalore');

var stream = geo
               .collection(collectionRef: queryRef)
               .within(center: center, radius: rad, field: 'position');

But let's say that in India you have 10,000 documents and in bangalore the city which is mentioned in queryRef has 2,000 documents. Even now you would be reading complete 2,000 documents, which means you would be billed for 2,000 document reads each time you query. Now, decide on whether to use this library or not! but highly appreciate the developers of this library for their contribution.

This is not true. I just checked the code.

Query _queryPoint(String geoHash, String field) {
    final end = '$geoHash~';
    final temp = _collectionReference;
    return temp.orderBy('$field.geohash').startAt([geoHash]).endAt([end]);
  }

This is the actual firestore query. The first parameter is the geohash of the approximate area from the center. So all the documents are ordered by their geohash field alphabetically (this is a free operation) and then ONLY the ones that start with the area geohash will be queried and read.

Example:

area geohash (first parameter): "GBS" all docs ordered by their geohash field:

..
...
FFG72DA12
GA2411231
GBS241241         <--------
GBS446322         <--------
HA92KA122
...
..

Only these two documents of this collection would be read.

This is not the case, evaluate by using a proper set of data.

Proper set in the case would be, choose points in two or more different cities. In each city choose points such that the resultant set contains points from different parts of the city.

Then evaluate by setting the radius within which you need the data. When executed, this package would actually read all the points in the dataset and then evaluates data based on the radius field you have supplied.

The code written wouldn't segregate by the first three letters of the geohash. If this is the case, then adjacent geohash(boxes) data would be missing in some cases.

Even if it is done, thinking that only three letters of geohash are considered while reading the data, then it would read all the data within 78kms radius and then would sort the data based on radius one might have supplied with.

Geohash precision. (maximum X axis error, in km)
1 ± 2500 2 ± 630 3 ± 78 4 ± 20 5 ± 2.4 6 ± 0.61 7 ± 0.076 8 ± 0.019 9 ± 0.0024 10 ± 0.00060 11 ± 0.000074

Above represents number of characters in geohash with the kms range covered.

This is a problem with firestore itself and they aren't yet supporting the geo based queries as far as I know. Appreciate the authors of this plugin for minimizing the cost for reading the queries.

But the problem isn't yet solved. It requires changes from firestore.

karthikreddi avatar Apr 20 '21 03:04 karthikreddi

Firestore has an article about that here:

What they are describing is pretty much what this library here does. My example was a fictional one with a huge radius (156km² in western France), and yes you are right about the adjacent neighbor boxes in this example. But even with including neighbor boxes it follows the same pattern I described. Let's say you do look at all 8 neighbor boxes as well. Then this library will generate 9 different geohashes (8 neighbors plus the center square) to perform 9 queries.

Each of the 9 geohashes is then used in this query:

return temp.orderBy('$field.geohash').startAt([geoHash]).endAt([end]);

So in this worst case scenario you read all documents that are within these 9 geohashes / boxes (startAt and endAt methods ensure this).

A result set will be joined together and the actual distances between the center and each result will be calculated client-side to get rid of false positive results.

What you said sounded like the entire collection will be read regardless of geohashes, and this is not true due to the startAt/endAt methos.

sentd94 avatar Apr 20 '21 08:04 sentd94

@sentd94 it looks like you tried to link to a Firestore article, but it did not show up in your post. Could you retry posting the link, please? It would be beneficial to the discussion.

@karthikreddi do you have any input on sentd94's last message?

Thank you

RodoHS avatar Jun 10 '21 00:06 RodoHS

@RodoHS This is the article: https://firebase.google.com/docs/firestore/solutions/geoqueries

They recently updated the article and now provide their own firebase library for GeoHashes.

sentd94 avatar Jun 10 '21 06:06 sentd94

In conclusion, is there a straightforward answer?

Does it query every single document in the collection whenever the .within is called (to find the documents / entries in the collection within a specific radius)?

On a side note, I want to thank @DarshanGowda0 for this library. It would also be incredible if you can provide some clarification on this issue (since you are the author of the library 😆)

suptejas avatar Mar 14 '22 08:03 suptejas

In conclusion, is there a straightforward answer?

Does it query every single document in the collection whenever the .within is called (to find the documents / entries in the collection within a specific radius)?

Upon examining the package's code, no, not every document: It appears that before reading from Firestore, GeoFlutterFire will order the documents' by their geoHash and only get the documents that contain that geoHash.

/// construct a query for the [geoHash] and [field]
  Query<T> _queryPoint(String geoHash, String field) {
    final end = '$geoHash~';
    final temp = _collectionReference;
    return temp.orderBy('$field.geohash').startAt([geoHash]).endAt([end]);
  }

The tricky bit is that the amount of documents returned is based on the radius you provide as that changes the geoHash into a substring of itself to accept more documents, so a larger radius = more documents read.

    ...
    // precision is what determines how many documents will be read:
    final precision = MathUtils.setPrecision(radius);
   // centerHash is the center point that you supplied, but reduced by precision to read more documents once passed:
    final centerHash = center.hash.substring(0, precision);
   // area is a List of surrounding geoHashes needed to determine what documents to read:
    final area = GeoFirePoint.neighborsOf(hash: centerHash)..add(centerHash);

    final queries = area.map((hash) {
      // tempQuery is what orders and limits the documents to get:
      final tempQuery = _queryPoint(hash, field);
      // _createStream is when Firestore is read from:
      return _createStream(tempQuery).map((querySnapshot) {
        return querySnapshot.docs;
      });
    });
    ...

None of this is very clear as it is quite confusing in of itself, so some documentation on this would be nice. Anyway, I hope that helps others out a bit.

AdamBridges avatar Sep 20 '22 21:09 AdamBridges

Pls note the project description at https://pub.dev/packages/geoflutterfire also talks about scalability

Scale to Massive Collections # It's possible to build Firestore collections with billions of documents. One of the main motivations of this project was to make geoqueries possible on a queried subset of data. You can pass a Query instead of a CollectionReference into the collection(), then all geoqueries will be scoped with the constraints of that query.

Note: This query requires a composite index, which you will be prompted to create with an error from Firestore on the first request.

giorgio79 avatar Oct 16 '22 18:10 giorgio79