streamz
streamz copied to clipboard
Transfer to notebook
Simply copying the script code to a notebook. Could use some cleaning and prose. For some reason, we are getting lookback on the cluster centres, whereas we should only be showing the most recent.
cc @maximlt
Thanks for starting to gather the code in a notebook 👍
If you want to display the most recent clusters (n=3) then this change would be required (fixing a typo too):
diff --git a/examples/river_kmeans.py b/examples/river_kmeans.py
index 4acbaf1..f204e9e 100644
--- a/examples/river_kmeans.py
+++ b/examples/river_kmeans.py
@@ -58,7 +58,7 @@ def main(viz=True):
return concat([previous, new]).iloc[-last_lines:, :]
partition_obs = 10
- particion_clusters = 10
+ partition_clusters = 10
backlog_obs = 100
# .partition is used to gather x number of points
@@ -74,8 +74,8 @@ def main(viz=True):
)
(
clusters
- .partition(particion_clusters)
- .map(pd.concat)
+ .partition(partition_clusters)
+ .map(lambda t: t[-1])
.sink(pipe_out.send)
)
Instead of concatenating the clusters (n=3partition) the stream just gets the last clusters (n=3) that were accumulated in .partition
. Note that it means that some clusters (=3(partition-1)) are not displayed at all.
Even if the script is working at the required cadence thanks to the use of .partition
I've opened an issue in holoviews (https://github.com/holoviz/holoviews/issues/5178) to report the error observed when holoviews (or panel or bokeh or tornado...) couldn't cope with a high frequency.