datproject-discussions icon indicating copy to clipboard operation
datproject-discussions copied to clipboard

Graphical editor for sketching pipelines

Open finnp opened this issue 10 years ago • 6 comments

Hey everyone,

the other night I was inspired by how interactive Open Refine allows you to work with a data workflow. I created a small prototype for something comparable for Gasket / UNIX pipelines.

The main idea is just to embed the command line interface into a browser/nodeshell, where the result of the commands is shown as an HTML table below.

I have a rough prototype over here: https://github.com/finnp/pipeline-editor

Ideas

  • The tool runs in a webbrowser (or atom shell?)
  • There is a field, where you can input the UNIX / gasket pipeline
    • e.g. curl random.org | parse | transform
    • However there shoudl be an abstraction layer around it, so that it is more like point and click for the user similar to http://gulpfiction.divshot.io/
  • Every step in the pipeline might be cached so that it is possible to rerun commands from a certain point (similar to IPython notebook).
    • This would be handy if the source of the data is slow
  • The output of the last command in the pipeline should be NDJSON or CSV, which then gets displayed in the UI table
  • Since it is mainly for sketching a pipeline, it could by default limit the input stream, so that it's quick to use

I feel like there were already ideas for something similar. Is there someone already working on this? What do you think about this?

Best, Finn

finnp avatar Oct 14 '14 15:10 finnp

Hey @finnp this is rad!

I think this is probably one of the more important pieces to dat that we haven't thought of yet, good on you for going for it. I think having a visual pipeline from "my data" to "dat data" is going to be key to onboarding and sustaining engagement with the dat registry.

We should probably think really critically and focus on the nuts and bolts, then add some more UI abstraction later. I like the IPython notebook idea, and drawing from gulpfiction sounds like a good plan. do you have any mockup ideas?

okdistribute avatar Oct 18 '14 07:10 okdistribute

This is my current basic UI sketch: unnamed

One important thing that is not included here is how the commands from the steps will be chosen. Currently I am assuming someone already has command line tools installed or knows how to. I am not sure yet how to make that easier. I thought searching npm or a given list of streaming modules would be helpful.

finnp avatar Oct 28 '14 17:10 finnp

Cool, this looks like a great simple design. What happens when you add new steps? Does the table go off your screen?

I'm thinking about perhaps showing the transformed data at each step, what do you think? Perhaps having something more like this, where there is a sidebar on the left that shows the history of transformations from beginning to end. I think its a little overwhelming to see them in a tree. What do you think about nailing the linear case first?

wireframe1

okdistribute avatar Oct 28 '14 18:10 okdistribute

Yes adding new steps right now, would move the table. This is how that looks right now. I haven't added the plus yet, but hitting enter adds a new step. localhost_2600_and_browser_js_-_users_finn_code_npm_pipeline-editor-_atom Having the steps in the sidebar makes a lot of sense!

Clicking through the differnt steps is also something I would like to include. I am not exactly sure how to do that. I guess I would cache the data for each step and if the data is to big cut it off after a few rows.

finnp avatar Oct 28 '14 18:10 finnp

Yeah, I think caching maybe the first 20 rows could work. For tables with very high # of columns, it could become a problem, so we'd have to set some sort of upper bound on how big each of the snapshots could be.

To be clear I think that clicking through the steps isn't really necessary. But if you agree it's something we should put on the feature list for later, it would make sense to craft the sidebar in such a way that would be easily added later.

okdistribute avatar Oct 28 '14 18:10 okdistribute

@finnp want to move this issue to http://github.com/datproject/projects?

okdistribute avatar Jun 17 '16 16:06 okdistribute