vscode-debug-visualizer icon indicating copy to clipboard operation
vscode-debug-visualizer copied to clipboard

Issue plotting large dataset from Python

Open aganders3 opened this issue 5 years ago • 5 comments

This is a great extension! I was considering starting to work on something similar when I found this. I was hoping to plot some rather large datasets using the plotly visualizer, but ran into an issue.

Plotting with 10k points looks OK:

plot_json = json.dumps({
    "kind": {"plotly": True},
    "data": [
        {"y": np.arange(10000).tolist()},
    ],
})

But it breaks at 10944 points:

plot_json = json.dumps({
    "kind": {"plotly": True},
    "data": [
        {"y": np.arange(10944).tolist()},
    ],
})

The error I get in the Debug Visualizer pane is (truncated):

Could not parse evaluation result as JSON:

Unexpected token ' in JSON at position 0

Evaluation result was:
'{"kind": {"plotly": true}, "data": [{"y": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
 ...
 10939, 10940, 10941, 10942, 10943]}]}'

Used debug adapter: python

For reference I would like to plot ~millions of points. I know plotly is capable of this (if a bit slow at times), and I have done it by serializing to JSON and sending to a basic web server before.

Thanks!

aganders3 avatar Aug 28 '20 02:08 aganders3

Thanks for your issue report ;)

I guess this is because the python debug adapter truncates the result which leads to a JSON syntax error... Can you figure out the max length of a string (in characters) when the debug adapter shortens the string? Maybe this constant can be found in the source. The context parameter is used to indicate watch, repl or copy mode - depending on the context, different truncating strategies are used. Each debug adapter handles this differently though.

Also, I'm unsure whether this is the right solution if you want to plot millions of points :D This extension is not (yet) optimized for large visualization data - there are many inefficiencies (the data is first extracted by the python debugger, then sent to vscode, then sent to the vscode extension, then serialized, sent to the webview, then deserialized, validated and finally processed by plotly).

May I ask what your usecase is?

hediet avatar Aug 28 '20 13:08 hediet

Nevermind, it is reasonable quick (less than 10 seconds):

86859a70-a333-4a90-ae4d-aaa026aee2a4

hediet avatar Aug 28 '20 13:08 hediet

Thanks for the quick response.

May I ask what your usecase is?

I can't go into too many details, but basically we use matplotlib to visualize plots of long time-series of USB commands (5-10 min worth with many commands/second) during debugging. This is clumsy if using the VSCode Remote - Containers extension to use Docker as a local development environment because you have to do some kind of X-forwarding from inside the Docker container to the host. We can save static images using matplotlib but

The workaround I had was to run a simple Python http server in the container, and write JSON files via the debugger to plot in using plotly.js (trying to also avoid adding dependencies). ~10s is about the performance I was getting with this method as well, but this also required opening a browser and navigating to localhost:8888/basic_plot.html or something. Having this in the debugger is cleaner even if the speed is the same, plus the ability to use it for visualizing other objects will come in handy.

aganders3 avatar Aug 28 '20 16:08 aganders3

You could significantly speed up rendering if you render the data to a png image that you then load as base64 encoded string in python. The extension can visualize base64 encoded png strings ;)

hediet avatar Aug 28 '20 16:08 hediet

I'm planning to use that feature as well! For this we really like having interactive plotting, though, because it's useful to zoom in to the ms/µs level. I could also implement this with a time-window before serializing to JSON, but that's not as nice as the GUI zoom.

aganders3 avatar Aug 28 '20 16:08 aganders3