firebase-query is very slow when loading an array with more than just a few items
Description
I am loading a firebase array with around 12000 items, using a Polymer element like:
<link rel="import" href="/bower_components/polymer/polymer.html">
<link rel="import" href="/bower_components/polymerfire/firebase-query.html">
<dom-module id="loader">
<template>
<firebase-query id="query"
path="/data"
data="{{data}}">
</firebase-query>
</template>
<script>
(function () {
'use strict';
Polymer ({
is: 'loader',
properties: {
data: {
type: Object,
value: [],
notify: true
}
},
ready: function () {
var self = this;
setTimeout (function () {
console.log ('Data load took ' + (self.$.query.__loadEnd - self.$.query.__loadStart) + 'ms');
}, 60000);
}
});
}) ();
</script>
</dom-module>
The setTimeout () callback is for my instrumentation, which in the current version of firebase-query looks like:
--- firebase-query.html.orig 2017-04-28 12:13:59.221771214 +0100
+++ firebase-query.html.instr 2017-04-28 14:31:49.199146378 +0100
@@ -308,6 +308,10 @@
},
__onFirebaseChildAdded: function(snapshot, previousChildKey) {
+ if (!this.__loadStart) {
+ this.__loadStart = performance.now ();
+ }
+
var key = snapshot.key;
var value = snapshot.val();
var previousChildIndex = this.__indexFromKey(previousChildKey);
@@ -318,6 +322,7 @@
this.__map[key] = value;
this.splice('data', previousChildIndex + 1, 0, value);
+ this.__loadEnd = performance.now ();
},
__onFirebaseChildRemoved: function(snapshot) {
Expected outcome
My app should load the data quickly, without interrupting the user experience too severely.
Actual outcome
My instrumentation timings for several loads are (times in ms):
37324.61 37408.37 36633.60 37129.21 38887.08
Yes, that is nearly 40 seconds to load just 12000 items! Worse, the browser is completely frozen while this is happening.
Steps to reproduce
- Have a decent-sized array in Firebase
- Load it as above
- Try to interact with page elements while the load happens
Browsers Affected
- [x] Chrome
- [ ] Firefox
- [ ] Safari 9
- [ ] Safari 8
- [ ] Safari 7
- [ ] Edge
- [ ] IE 11
- [ ] IE 10
I've tested with Chrome, but I doubt this is browser specific.
Instrumented timings for my patch in #213 (times in ms again):
46.03 31.91 44.01 33.36 47.47 33.52
All I did was switch firebase-query and hit reload.
Instrumentation patch for the PR version:
--- firebase-query.html.orig 2017-04-28 17:04:14.717298479 +0100
+++ firebase-query.html 2017-04-28 17:04:18.429219494 +0100
@@ -323,6 +323,7 @@
},
__onFirebaseValue: function(snapshot) {
+ this.__loadStart = performance.now ();
this.query.off('value', this.__onFirebaseValue, this);
if (snapshot.hasChildren()) {
@@ -343,6 +344,7 @@
this.set('data', data);
}
+ this.__loadEnd = performance.now ();
/* Now start listening for changes */
this.query.on('child_added', this.__onFirebaseChildAdded, this.__onError, this);
},
@@ -360,6 +362,7 @@
this.__map[key] = value;
this.splice('data', previousChildIndex + 1, 0, value);
+ this.__loadEnd = performance.now ();
}
},
Just a question, why would you need to load 12000 items?
I dunno if changing the child_added to value is the right way here. The thing with using child_added in query is because it adds the children one by one from the query, producing an array. This allows us to just add specific listeners to each of the child elements (when it is edited, it fires a child_changed, when it is deleted, it fires a child_deleted)
Although I see that once it has get the values using the 'value' event, it turns the listener off. Then fromt there, it adds the values in the data array. The thing is at the end, you start listening to the child_added. Doesn't this just re-do the one done in the value?
I don't really think you should be using firebase-query for anything larger than a couple hundred items. Something that large ought to have a more custom-adapted solution.
On Wed, May 3, 2017, 9:12 PM Toni-Jan Keith Monserrat < [email protected]> wrote:
I dunno if changing the child_added to value is the right way here. The thing with using child_added in query is because it adds the children one by one from the query, producing an array. This allows us to just add specific listeners to each of the child elements (when it is edited, it fires a child_changed, when it is deleted, it fires a child_deleted)
Although I see that once it has get the values using the 'value' event, it turns the listener off. Then fromt there, it adds the values in the data array. The thing is at the end, you start listening to the child_added. Doesn't this just re-do the one done in the value?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/firebase/polymerfire/issues/212#issuecomment-299093276, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAD_j0YaMqB8wFUz-o_KrqTvXzJjToWks5r2VAagaJpZM4NLvYA .
The thing with using child_added in query is because it adds the children one by one from the query
I think this is why it's so slow. I believe Polymer runs all the observers synchronously before set () or splice () returns. When you do that for every single child, the time soon mounts up.
Doesn't this just re-do the one done in the value?
Yes, which is why it tests the __map property before doing anything with it.
I don't really think you should be using firebase-query for anything larger than a couple hundred items.
It would be nice to see the limitations documented then, before anyone else runs into this.
Yes, which is why it tests the __map property before doing anything with it. Then there goes the problem. It is hidden but it still downloads all 12000 x2 even if you checked it with this.__map before saving it.
If all 12000 items has a size of 2MB, you are downloading another 2MB because of the child_added listener. That is I think a no-no in web performance.
It would be nice to see the limitations documented then, before anyone else runs into this. Yes it is nice to have this documented but I think it is sort of good practice not to download more than a 100 in one call in all kinds of documentations on calling database apis
in essence, it is not really good practice to download a large set of items over the network. Especially if the user will just need to see about 20 - 50 at a time.
If the reason for 12000 items is to crunch it, try using cloud functions instead.
If all 12000 items has a size of 2MB, you are downloading another 2MB because of the child_added listener. That is I think a no-no in web performance.
The timings I posted at the top of this thread would beg to differ. 40 seconds using child_added on its own, including 10s of seconds of frozen browser UI, vs with value it's 40 milliseconds to the last child_added event even if the data is loaded twice.
Whatever. The patch works for me, take it or leave it. If you don't want a 1000x speed improvement it makes no difference to me.
It's not about 40 seconds over 10 seconds. Yes I agree it is fast enough. But you are still downloading additional 2MB on the network side: That is still part of web performance - you are wasting a person's bandwidth by hiding the fact that you added a 'child_added' listener event at the end. Yes it is not running a javascript function to save the redundant 2MB (which makes it 'look-like' it is faster), but again, just wasting a redundant 2MB is a problem.
Just look at the network tab of the developers tool and see if it is loading the extra 2MB.
The patch works, but I am pointing out that problem which is not production friendly.
@dickp I didn't catch that the difference was 1000x. I'm curious as to what exactly is causing this massive difference in performance.
@mbleigh I think it is because the function this.__onFirebaseChildAdded will be called 12000 times, which will take quite a number of function calls instead of just one call of this.__onFirebaseValue when using the event listener value
Also the child-added handler invokes Polymer data-binding path notifications for the array each time.
I agree with @mbleigh 's assessment that the number of items being considered is unrealistic. Even with the speed up, there will be problems rendering this much data in the client for most non-trivial usages.
That said, seems 40s is pretty bad. It seems likely that the problem is at a higher level than the Firebase client library. However, it would be nice to confirm that with profiling.
@cdata do you think this could be solved just by making __onFirebaseChildAdded called with a requestAnimationFrame or setTimeout(..., 0)?
@mbleigh and just queuing it in the event stack before running it in the main stack? Isn't the 'child_added' event doing it already?
I think child_added events may be called synchronously when a big chunk of data arrives over the wire all at once. Would need to confirm with Firebase DB team.
But I think the requestAnimationFrame (or using Polymer.RenderStatus.afterNextRender) would at least not hang the browser... but it will still make the call distributed over a long period. Might be more than 40 seconds but at least it will not hang. The other thing that I can see is that you can see the data being added visually by chunks.
Related to @tjmonsi 's comment, I kind of wish that the Firebase SDK would help with this so that as the SDK user all I have to worry about is handling the events as they come. That said, I don't think it's easy for the SDK to know how best to chunk up work for the user.
A simple requestAnimationFrame will amortize the 40s of work over a longer stretch of time. Importantly, if there are other events in-between (remove, change etc), they would somehow need to be coordinated with the deferred child-added events so that everything happens in the correct order.
We can try to design some kind of queue / scheduler for change operations to the data. I'm thinking something along the lines of: all change events are pushed into a queue, queued work is batched by a scheduler that yields after a determined amount of time. My main concern with this approach is that it further complicates an already complicated dance between the Firebase SDK and Polymer's data-binding.
@dickp 's solution is much simpler at the cost of being less general. My main concern with it is that if there is an orders-of-magnitude penalty for the way we are doing things, we should probably fix that in a general way. Ordered queries are a very common use case for firebase-query, after all.
@cdata the problem I got with @dickp 's solution is that it requests the same data again. The performance on javascript was fast I agree. But it is wasting the bandwidth twice.
I did a simple profiling and it was really slow. The data feed was fast though: about 1.2 seconds and all the data is in my browser even when using the child_added event

But I was right when it comes to javascript profiling:

It will get stuck because the main caller, which is firebase-database.js is hogging the whole main stack. Each purple in my stack is the call on this.__onFirebaseChildAdded, which is called 12000 times. That will be the reason why it will make the browser hang, even if the data has been loaded 1.2 seconds.
I tried encapsulating this.__onFirebaseChildAdded inside setTimeout(..., 0) and it made it look fast. The calls to this.__onFirebaseChildAdded is now put in the event queue waiting for the main stack to be freed. The only problem is you can see the data being put into the dom at a time (although it is fast, it is taking sometime).

Now for each purple function (this.__onFirebaseChildAdded), it will run the corresponding changes in the template.
You can see the profile here (90MB because of the pictures): https://dl.dropboxusercontent.com/u/22287668/Snapshot%20of%20child_added%20experiment%20on%20setTImeout.profile
If without putting it on template though, You'll have this:

You are still calling this.__onFirebaseChildAdded 12000 times, but now not hogging the main stack. But if you do a button click or something that will put a function at the event queue (like a button click or a callback), it will still wait for all the this.__onFirebaseChildAdded to finish before processing the function put in the end of the event queue, even if it was done while the main stack is free.
You can see the profile here (20 MB): https://dl.dropboxusercontent.com/u/22287668/Snapshot%20of%20child_added%20experiment%20on%20setTImeout%20without%20template
Just look at the network tab of the developers tool and see if it is loading the extra 2MB.
I did, and it isn't. I suspect the firebase layer is caching the data. It's hard to be sure as the firebase code is minified and I really don't have time to pick it apart. Other circumstantial evidence is, 1) that my internet connection won't shift 2MB in 40ms; and 2) there is roughly 7s between the firebase-query element being ready and the first value event happening - which is much more plausible for downloading the data.
I didn't catch that the difference was 1000x. I'm curious as to what exactly is causing this massive difference in performance.
@mbleigh I think the other responses here have nailed it - the trip through data-binding notifications for every element is where my money is. So an alternative solution might involve batching the calls to splice () inside __onFirebaseChildAdded ().
Even with the speed up, there will be problems rendering this much data in the client for most non-trivial usages.
@cdata You'd be surprised :) I'm actually loading the array into a vaadin-combobox and it's pretty snappy, even on Android.
@dickp 's solution is much simpler at the cost of being less general.
I'd be happy to have an opt-in property if it would make everyone else less averse to the fix.
@dickp Ok, I'll try to look it up on your side and do a snapshot of the network. As far as I checked, the child_added event already serves the 12000 items in 1.2 seconds. Will try it out if the changes you made are spot on
The Firebase SDK definitely caches locally in-memory, so an on('value') will start a listener and you can separately do on('child_added'). This will not incur double network cost.
On Fri, May 5, 2017 at 11:55 AM Toni-Jan Keith Monserrat < [email protected]> wrote:
@dickp https://github.com/dickp Ok, I'll try to look it up on your side and do a snapshot of the network. As far as I checked, the child_added event already serves the 12000 items in 1.2 seconds. Will try it out if the changes you made are spot on
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/firebase/polymerfire/issues/212#issuecomment-299547667, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAD_gSQk_KmaTfqQpQof-Gtm1Zn7L8Uks5r23CkgaJpZM4NLvYA .
@dickp I sit corrected. You are right. The Firebase SDK does save the data somewhere and caches it, which will not make any redundant calls.

What I saw though is that __onFirebaseValue is called twice.
@dickp ahh wait, maybe it is the off function?
Anyway, the only thing here is when there's sorting involved, then we still revert to default, which is not a special case (and used often). How can we fix that?
~~I didn't find anything in the Firebase documentation that suggested it's fixable by the consumer of the library.~~
~~Maybe raise a feature request?~~
Actually, my suggestion above of batching splice () inside __onFirebaseChildAdded () would fix that.
@dickp maybe, i was thinking of sorting it on the fly just like what this.__onFirebaseChildAdded is doing (doing some splicing or inserting into the right order), but without the previousChild. That means looking at the values of orderByChild or orderByValue, and also looking at limitToLast or limitToFirst if ascending or descending...
@dickp ahhh nevermind, without a previousChild, the insertion technique would just be O(n^2) on worst case.
Anyway, I left a feature request to the Firebase team if it is possible to add a $previousChildKey or __previousChildKey on the attributes of each child element of the on('value') query if and only if there's a orderByChild / orderByKey / orderByValue sorting method present in the query. Hopefully they respond
If you do snapshot.forEach they will be emitted in query order.
On Fri, May 5, 2017 at 1:08 PM Toni-Jan Keith Monserrat < [email protected]> wrote:
Anyway, I left a feature request to the Firebase team if it is possible to add a $previousChildKey or __previousChildKey on the attributes of each child element of the on('value') query if and only if there's a orderByChild / orderByKey / orderByValue sorting method present in the query. Hopefully they respond
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/firebase/polymerfire/issues/212#issuecomment-299563522, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAD_hIrBdwdxAy1TyXqZN9TQPkvdVcaks5r24GhgaJpZM4NLvYA .