SandDance icon indicating copy to clipboard operation
SandDance copied to clipboard

add json(s), Excel and arrow data formats support

Open RandomFractals opened this issue 4 years ago • 12 comments

see Data Preview 🈸 vscode extension for example of how to integrate those data formats: https://dev.to/tarasnovak/vscode-data-preview-for-devs-around-the-39mn

You can use or peruse my custom Data Manager API & src/data.providers folder for data loading and saving implementation details to enrich SandDance with more data source type choices ...

RandomFractals avatar Dec 15 '19 17:12 RandomFractals

I am surprised this only had one vote and it was a downvote, at least for the Arrow support, as that would clearly be a good way to share data from Jupyter implementations.

gramster avatar Aug 10 '20 20:08 gramster

@gramster - is your Arrow data stored as a file? Wondering if there was some other mechanism from Jupyter you had in mind.

danmarshall avatar Aug 10 '20 20:08 danmarshall

ha! I forgot about logging this.

@danmarshall would be nice to have both, from file & pipe as described in #213

RandomFractals avatar Aug 10 '20 21:08 RandomFractals

@RandomFractals - does your extension support piping?

danmarshall avatar Aug 10 '20 22:08 danmarshall

no, but you can call it with data file uri to open data preview similar to how I suggested you integrate vega viewer with SandDance in #153

so, you'd just call it with:

commands.executeCommand('data.preview', dataFileUri)

RandomFractals avatar Aug 10 '20 22:08 RandomFractals

and you can check if data preview is installed via get commands:

// execute requested data preview command
    let viewDataCommand: string = 'vscode.open'; // default
    commands.getCommands().then(availableCommands => {
      if (availableCommands.includes(this.dataPreviewCommand)) {
        viewDataCommand = this.dataPreviewCommand;
      }
      commands.executeCommand(viewDataCommand, dataUri);
    });

see how I do it in vega viewer: https://github.com/RandomFractals/vscode-vega-viewer/blob/master/src/vega.preview.ts#L279

RandomFractals avatar Aug 10 '20 22:08 RandomFractals

I was thinking of data in the Plasma object store. We had an intern prototype viewing dataframes from the Jupyter notebook in VS Code in SandDance, but that involved (IIRC) serializing the data as CSV and passing it in a URL, which clearly won't scale well. I'm wondering what we could do for large datasets (obviously writing to a file on disk is an option too, and maybe that's all we really need).

gramster avatar Aug 10 '20 22:08 gramster

yeah, I think to have it scale, writing to disk in raw arrow data format, rather than CSV might be a better option and than have SandDance or some other extension load a user friendly data frame/grid view.

would be nice if vscode had some IPC api for extension integrations and sharing data in memory and arrow is perfect for it. I just don't think we have a vscode api for that yet.

RandomFractals avatar Aug 10 '20 23:08 RandomFractals

@danmarshall have you looked into this yet?

RandomFractals avatar Oct 15 '20 18:10 RandomFractals

@RandomFractals no I haven't.

danmarshall avatar Oct 15 '20 18:10 danmarshall

Sorry, I missed this. We aren’t doing anything with Arrow yet, but have been talking about using it in the future for sharing data between kernels in polyglot notebooks.

On Thu, Oct 15, 2020 at 11:22 AM Dan Marshall [email protected] wrote:

@RandomFractals https://github.com/RandomFractals no I haven't.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/SandDance/issues/154#issuecomment-709507063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVCPCCRSFPJOPYLY2JHXXTSK44UVANCNFSM4J3BC2LA .

gramster avatar Oct 15 '20 18:10 gramster

yeah, @gramster: that's the one scenario where I think we are close to getting it work once you go ga ;)

still, that's only in the context of .net interactive notebooks, or .dib's as you call them :)

I brought it up with vscode team in our last authors feedback monthly call & their stance on this is that extensions can device their own ways of sharing data, i.e. no plans to provide a built-in vscode api for that anytime soon. It did come up a few times in convos with other extension authors in vscode dev community slack.

I think if they added some channels pub/sub, we could see a lot of clever integrations for extensions sharing data beyond notebooks.

RandomFractals avatar Oct 15 '20 18:10 RandomFractals