dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

Interest in Intake?

Open martindurant opened this issue 4 years ago • 7 comments

(Dask core member here - I found your project because you requested to be included in "powered by dask")

The Intake project provides a data cataloguing and loading layer over many data formats and services. It also contains a rudimentary GUI for browsing those catalogues, and interactively plotting the contents of the contained data sources.

I thought you might be interested in seeing whether there is a possibility for integration of your "connectors" into an Intake catalogue, and of your data exploration tools into the Intake GUI.

martindurant avatar Aug 17 '20 13:08 martindurant

Hi @martindurant ,

Thanks for your great suggestion! We have been looking into it in the last few days. We will get back to you once we have a good answer. At the same time, if you can elaborate on the technical details about the integration, that will be much appreciated.

jnwang avatar Aug 19 '20 15:08 jnwang

I see two main aspects:

  • dataprep gives you access to specific data sources, with optional arguments. These could be wrapped into an intake catalogue, so that if you have dataprep installed, intake.cat will include an entry which is a catalogue of those sources, with the same description, options and metadata as already available in your API. This would be a convenience shim, so that people used to the Intake world can read your data in a familiar way.

  • dataprep provides interactive graphics, somewhat similar to the dfviz dataframe viz plugin in the Intake GUI. It would be possible to make your viz an alternate or replacement (dfviz is functional, but very young), as something that can live within the Intake GUI or as an output of source.plot

martindurant avatar Aug 19 '20 16:08 martindurant

Hi @martindurant, thanks for the suggestions. Actually I was thinking of a bi-directional integration of intake while reading the documentation. Basically there will be a shim to let DataPrep.connector read data from intake and also let intake read data from connector.

dovahcrow avatar Aug 19 '20 22:08 dovahcrow

I think that's what I meant by referring to using your interactive features with an Intake dataset :) I'm not sure whether the shim would need to be in connector, since Intake will provide you with pandas/dask dataframes already.

martindurant avatar Aug 20 '20 13:08 martindurant

We have an intake community meeting on the first Thursday of each month, if anyone here would like to drop by https://github.com/intake/intake/issues/472

martindurant avatar Aug 20 '20 19:08 martindurant

I think that's what I meant by referring to using your interactive features with an Intake dataset :) I'm not sure whether the shim would need to be in connector, since Intake will provide you with pandas/dask dataframes already.

Currently, DataPrep only supports sending restful API requests to a URL endpoint. So I think there should be a shim to enable Connector to have Intake as the data source. On the other hand, I think we can also provide an intake plugin, to loading data from DataPrep.Connector.

dovahcrow avatar Aug 24 '20 01:08 dovahcrow

We have an intake community meeting on the first Thursday of each month, if anyone here would like to drop by intake/intake#472

Thanks for the invitation! I personally will join the meeting and other team members may also join too.

dovahcrow avatar Aug 24 '20 03:08 dovahcrow