mathdown
mathdown copied to clipboard
User-visible warning for disconnected/not saving state
Yesterday I pushed broken CM/firepad combo such that pages loaded, you could edit but edits weren't saved! There was no user-visible indication this is the case. But simply being offline gives a similar result.
- [x] Visible indication that firebase is offline and not syncing.
- [ ] Close the loop verifying edits really made it into firebase (REST api?), such that unsaved edits — for any reason — for several seconds give the warning. It's not enough to check firepad's idea of version made it to the server; I suspect in this case there was an exception early in firepad such that it's version. Ideal might be comparing full text (not too frequently)?
Local storage backup wouldn't hurt too, but that's another story.
Asked for advice at firepad group: https://groups.google.com/forum/#!topic/firepad-io/s1aTZXZniAM
This is partially implemented, and works very nicely when typing and when internet goes offline/online.
But I haven't yet tested how it behaves in the face of bugs like #85 — I'm relying on Firepad knowing when it's synced but there might be a situation when Firepad itself isn't aware there was a change in the editor.
I think I should use headless firepad API to periodically (e.g. once a minute) do a clean text reconstruction of what is currently saved in firebase and compare to the editor.
Also, cross-compare with Firebase's connected status for more useful reports e.g. "unsaved because offline" vs "online but not saved — BUG?"...
Bah. What I pushed was somewhat broken (interleaved for extra karma with pushing a an extremely sloppy "redesign" breaking the [New document] button and the whole display on IE)
- The status indication overlaps the upper part of the header line, interfering with clicking stuff like [bugs]. e0f1b36ccfc2287670c2e24a7da8318c7db6db5e fixed this for most of the width, but the transparent "saved" or "done" text is still there, making it very confusing why you sometimes can't click the underlying [bugs] link (if your window width is such that they overlap).
- On IE8, not only the transitions don't animate (which I expected an am fine with) but
opacity
doesn't work at all, so "done" and "saved" texts are visible all the time.
I'm somewhat fond of the fading out to "done" / "saved" so looking at solutions that don't give them up. http://viget.com/inspire/css-pointer-events-and-a-pure-css3-animating-tooltip and http://www.greywyvern.com/?post=337 sound promising...
BTW, pointer-events: none
initially looked like a good direction, except it doesn't work on IE<11,
and also didn't work on my Chrome until I set visibility: hidden
on #status_container
.
Setting that also helped visibility
transition on status
to work, and visibility support is robust so I'm going with only that.
Back to the real goal: would current indication based on firepad's synced
event catch #85? => NO.
(Reproducing #85 took some effort. Going back to the buggy CM-MJ wasn't enough; it seems firebase-debug.js not only reports but also tolerates exceptions better, and I only managed to reproduce with the non-debug firepad.js)
- [ ] If we detect this kind of error, report it somehow in firebase (with full base version + doc text?) so future analysis can at least know how much this happened (and retroactively merge?).
Aha moment!
After sendOperation() called with invalid operation.
, firepad goes back to report that it's "synced".
I can — and should — fix this upstream.
Progress: Headless firepad works. (For some reason it didn't for ~ a day when I initially played in the JS console but now it works both from script and console) And it does seem to reliably warn when firepad gets borked and doesn't save. There are race conditions leading to false positives, should debounce it after document stops changing.
But the much bigger issue is performance. As commented on 0e8a2d41cd8082115a6164796758ebe6d5b886df , I'm seeing a memory leak and slowdown; after about a day of holding 2 tabs open (with a tiny document), memory grew to 1.5GB and CPU approached 100%. I think there was also high GC churn but didn't look at numbers. [This is with new Headless -> getText -> comparison every 1sec, for debugging.] Memory leak doesn't seem to happen in current prod version (stays ~50MB / tab, half of which is JS memory). CPU is explained by Headless getText() latency growing to several seconds, with the timer piling up another Headless -> getText every second.
Is memory leak root cause of slowness? I did experince whole computer slowdown. Is getText pileup root cause of leak? Easy to imagine - Headless objects would pile up - but unlikely, as leak is observable immediately in a fresh tab. Are they independent problems?
Ideas:
- Dig deeper - heap profile, CPU profile on getText, ...
- Lower checking interval to something like 1 minute. This is necessary in any case, as I won't have perfect understanding of performance ramifications. But first let's debug 1sec which amplifies problems.
- Reuse one Headless instance instead of new one every time?
new Firepad.Headless
returns instantly but getText on new instance takes >200ms (>400ms after significant edits), while on existing one it's ~1ms. On my paper-length doc, getText is ~20ms but still 200~600ms after new. But I'd like to keep thenew
as more bullet-proof. - Prevent pileup by scheduling new check only once getText completes?
Do I still care about pileup if it's done once per minute? Maybe, I'm worried getText might take unbounded time when offline.
Can this lead to losing one check and never resuming? OTOH, if it's possible for getText to get lost, we might in theory get stuck in false
testedSynced = true
forever. But here is a concrete worrying scenario: data in firebase gets really corrupt (never happened AFAIK, just unsaved) causing getText logic to crash every time. Idea: assume the worst, settestedSynced = false
every time until getText returns? [perhaps better to have separate less worrying "Veryfying save..." message.] - Debouncing should mitigate a lot. Do checks only when we've been online, firepad believes it's
synced
, and no incoming changes for some time (5sec?). But let's debug root causes first while it's bad.
Oh, dev consoles leak a lot as well. Quickly reach hundreds of megabytes. This is easily explained by my very verbose logging, including JS objects and DOM elements. Chrome and FF do throw away old messages when console is closed for a long time, but is that true for all browsers? Perhaps it's time to default logging to off in prod (with function and/or URL param turn it on)?
https://groups.google.com/forum/#!topic/firebase-talk/AEDekJ_B480 is interesting.
It's likely that new Firepad.Headless()
registers new firebase callbacks that never go away (and keep the Headless itself alive and updating...)
Just reducing CPU usage to say once/minute is not good enough. Consider a person with 30 mathdown tabs — now one of the tabs wakes up (and largely swaps in) every 2 seconds, even if nothing changed. I should either avoid checks entirely when nothing changed, dynamically backoff frequency, or both.
I should also check whether this whole feature noticably increases a tab's memory usage. At best, I'll still be constructing 2 copies of the text (codemirror getValue, firepad getText). But that's negligible compared to the full usage I see.
Using heap profiling on the ~100K document. Without headless checking it's 70MB on load, 170MB after renderingm math. With checking, after being open for 1000s, it's 360MB. During 50s heap allocations profile, the biggest allocation is the string with the full text (~200K due to UTF-16) => 7MB over 50s.
All these strings are retained! They're retained in TextOp.text
, which is retained in TextOperation.ops[0]
array, retained as FirebaseAdapter.document_
, which is retained in fb.core.view.ChildEventRegistration
under callbacks.child_added
closure's self
.
Over many allocations, these accumulate in an array of callbacks (I think on the history/
ref, not 100% sure).
Which is exactly what I assumed above - new Firepad.Headless() retains everything via callbacks.
Let's try with single Headless now. Then I'll mail Firepad folks (about fixing synced
and about destroying a Headless — I also need that for #7 long lived server) and possibly proceed.
Oh, and the accumulating callbacks also explain slowdown :-)
Brilliant - 8d031b74 fixed the leak according to Chrome allocations profiler :-)
Process RAM still grows to >400MB but is then garbage-collected back to 250MB, same as the mathdown.net non-checking tab stabilized. This happens every 30~40s but remember that's with getText every 1s!
The stable heap dump usage mainly consists of 50~60MB of DOM and ~40MB of CodeMirror.Line
s arranged in a b-tree. It's similar with checking and without.
[P.S. One would expect the partial DOM to be much smaller than CodeMirror's full b-tree, but this is a math-heavy document.]
Looked at firebase analytics and saw this strange bandwidth graph:
It's never been this sharp! And why is it suddenly flat? These are exactly the days when I was testing this feature! (and yes, it's sad that I alone can dominate Mathdown's bandwidth.)
I really hope that the high bandwidth was specific to the fresh-Headless-every second version and the one-Headless is saner. Need to test before landing. (of course, it's not gonna be every second. Good that I'm stress-testing this way or I wouldn't notice it.)