streamz icon indicating copy to clipboard operation
streamz copied to clipboard

Transfer to notebook

Open martindurant opened this issue 3 years ago • 1 comments

Simply copying the script code to a notebook. Could use some cleaning and prose. For some reason, we are getting lookback on the cluster centres, whereas we should only be showing the most recent.

cc @maximlt

martindurant avatar Dec 30 '21 21:12 martindurant

Thanks for starting to gather the code in a notebook 👍

If you want to display the most recent clusters (n=3) then this change would be required (fixing a typo too):

diff --git a/examples/river_kmeans.py b/examples/river_kmeans.py
index 4acbaf1..f204e9e 100644
--- a/examples/river_kmeans.py
+++ b/examples/river_kmeans.py
@@ -58,7 +58,7 @@ def main(viz=True):
         return concat([previous, new]).iloc[-last_lines:, :]
 
     partition_obs = 10
-    particion_clusters = 10
+    partition_clusters = 10
     backlog_obs = 100
 
     # .partition is used to gather x number of points
@@ -74,8 +74,8 @@ def main(viz=True):
     )
     (
         clusters
-        .partition(particion_clusters)
-        .map(pd.concat)
+        .partition(partition_clusters)
+        .map(lambda t: t[-1])
         .sink(pipe_out.send)
     )

Instead of concatenating the clusters (n=3partition) the stream just gets the last clusters (n=3) that were accumulated in .partition. Note that it means that some clusters (=3(partition-1)) are not displayed at all.

Even if the script is working at the required cadence thanks to the use of .partition I've opened an issue in holoviews (https://github.com/holoviz/holoviews/issues/5178) to report the error observed when holoviews (or panel or bokeh or tornado...) couldn't cope with a high frequency.

maximlt avatar Jan 08 '22 14:01 maximlt