turicreate icon indicating copy to clipboard operation
turicreate copied to clipboard

tc.visualization.scatter(x,y) hangs when `x` and `y` are `SArray`s of 3 million integers each

Open shantanuchhabra opened this issue 6 years ago • 5 comments

Repro on Debug build on macOS High Sierra:

x = tc.SArray(range(1,3000000))
y = tc.SArray(range(5,3000004))
scplt = tc.visualization.scatter(x,y)
scplt.show()

This hangs.

FYI, for a heatmap the plot appears in less than a second.

shantanuchhabra avatar Apr 09 '18 21:04 shantanuchhabra

@shantanuchhabra Please provide:

  • Repro steps?
  • Expected vs. actual behavior?
  • How long does it take? How long should it take?
  • Debug or release build?
  • What platform/hardware?

znation avatar Apr 09 '18 21:04 znation

I'm experiencing the same hang in scatter plot on a jupyter notebook. I'm trying to scatter plot a roc_curve of a classifier with 13K data points on x and y it took several minutes for the plot to display. Any suggestions?

VlamV avatar Apr 24 '18 12:04 VlamV

@VlamV For now this is a known issue but not a high priority for us. If you need scatter plots in particular with this much data, you could try using another visualization library with scatter plot capability and see if it can handle this much data. Or, you could try a different plot: tc.visualization.heatmap should handle much more data comfortably, and will show density over ranges instead of individual points.

znation avatar Apr 25 '18 01:04 znation

@shantanuchhabra , were you able to find a workaround of same ?. I have same issue with 20K data points

subodhchhabra avatar Nov 12 '19 07:11 subodhchhabra

This is still an issue in 6.4

TobyRoseman avatar Sep 01 '20 22:09 TobyRoseman