nupic.visualizations I want to be able to visualize (streaming) data online, as the (NuPIC) model is running

trafficstars

We've discussed this in the initial issue, there seem to be 2 approaches:

modifying the (nupic) framework so that the model is able to send "update" part to the visualizations tool.
another approach would be simply re-reading (somehow smart, keeping last line position etc) the raw CSV file (updated as the model is running).

I find the latter better for 2 reasons: it does not tie us to NuPIC only, works for any updated CSV file, and secondly would not require (complex) changes to NuPIC ModelRunner framework.

UI changes to enable this could be:

[ ] checkbox 'Online updates'
some (text) entry to set a numeric value for (time/num new lines) of update size

Blocked by: ~~#16~~, #61

Nov 22 '15 16:11 breznak

:+1: :100:

Dec 04 '15 23:12 rhyolight

@rhyolight looking forward to getting back to this project soon, as soon as I finish up some other responsibilities.

Dec 05 '15 03:12 jefffohl

So, with the 2 standing PRs, the speed bottleneck should be somewhat resolved and internal support for streaming is in place.

What is left is a mechanism to monitor updates to the data file (eg. periodically check the size) and update (only) with the newly added chunk of data (ideally a non-polling mechanism but on-request/update). I think the OSs do this well. I've checked with upstream and it's a known problem with no ideal solution: https://github.com/mholt/PapaParse/issues/49#issuecomment-163164936

These are the ideas that we have collected:

sliding window : for real (infinite) streaming data, implement what @rhyolight did in RiverView - read the last WINDOW_SIZE rows, append/drop to existing, crop, plot. Can use fixed-sized FIFO (deque) to implement the sliding window. (Is this what you did?)
interactive appending : for large data, where we can (and want to) see the whole file in the end, but the creation (eg. running a HTM model) takes long time and we want to see the results in progress. We could a) just reread the whole file; b) remember last rowId, seek to that position, read next, append to our values.
sampled : We'd set POINTS_PER_GRAPH=10.000, data is read, subsampled, and rendered. On data update/zoom we reread the interesting section, subsample again,...

Any idea, preference about these/other options?

Dec 09 '15 10:12 breznak

@breznak @jefffohl @rhyolight This is rad, thanks for the hard work.

Dec 11 '15 18:12 brev

Thanks @brev !

Dec 11 '15 18:12 jefffohl

TY @brev ! Would be nice to get your feedback and possible use-cases, if you like :)

Dec 12 '15 00:12 breznak

Upcoming fix from Jeff for #56 further ensures the speed is OK.

Jan 06 '16 03:01 breznak

@jefffohl with fixes in #64 and #66 I'd like to continue working on this functionality.

can we change the behavior of loading in the windowed-mode to read the file from the end? So that only 5MB is ever read; if I understand it now, all the file is parsed, then only the last 5MB left for rendering, correct? I'd like to avoid this wasting.
my plan to implement this feature is:
- keep loading normally until end of file is reached
- then switch to monitoring mode, that will poll the file every N miliseconds (~5000 default), compute a hash of the last row to see if the file changed. If yes, reread and render. Does it sound reasonable?

Jan 06 '16 16:01 breznak

@breznak - I was imagining that we could periodically check the file to see if it has been modified, not actually read the file. If the file has been modified, then read.

Note also that for windowing, there are two things to be aware of:

The file size limit is what is used to determine if windowing will be used or not. Right now, that limit is set to 5MB. We can add a feature that allows this to be manually set.
The number of rows in the window is not related to the file size. Right now, the window buffer size is set to 10,000 rows.

The reason that the file size is not explicitly related to the number of rows in the buffer, is that we need to decide whether to window or not before we know how many rows there are.

Jan 06 '16 21:01 jefffohl

I was imagining that we could periodically check the file to see if it has been modified, not actually read the file. If the file has been modified, then read.

yes, i think that's the idea. Will this work for remote files as well? (although it doesn't have to be supported since start) I think I saw some code to get a file size, that would be what we want, I guess?

The number of rows in the window is not related to the file size. Right now, it is set to 10,000 rows.

I know, OK I think.

The file size limit is what is used to determine if windowing will be used or not. We can add a feature that allows this to be manually set.

I think it can stay that way, just the monitoring will switch to windowing if needed.

Jan 06 '16 21:01 breznak

Most servers should send back a "Last-Modified" header, so we could check that for remote servers. We can also just check the size (which we are already doing), and if that has changed, assume that new data has been added.

Jan 06 '16 22:01 jefffohl

nupic.visualizations nupic.visualizations copied to clipboard

I want to be able to visualize (streaming) data online, as the (NuPIC) model is running

nupic.visualizations
nupic.visualizations copied to clipboard