dataprep
dataprep copied to clipboard
Interest in Intake?
(Dask core member here - I found your project because you requested to be included in "powered by dask")
The Intake project provides a data cataloguing and loading layer over many data formats and services. It also contains a rudimentary GUI for browsing those catalogues, and interactively plotting the contents of the contained data sources.
I thought you might be interested in seeing whether there is a possibility for integration of your "connectors" into an Intake catalogue, and of your data exploration tools into the Intake GUI.
Hi @martindurant ,
Thanks for your great suggestion! We have been looking into it in the last few days. We will get back to you once we have a good answer. At the same time, if you can elaborate on the technical details about the integration, that will be much appreciated.
I see two main aspects:
-
dataprep gives you access to specific data sources, with optional arguments. These could be wrapped into an intake catalogue, so that if you have dataprep installed,
intake.cat
will include an entry which is a catalogue of those sources, with the same description, options and metadata as already available in your API. This would be a convenience shim, so that people used to the Intake world can read your data in a familiar way. -
dataprep provides interactive graphics, somewhat similar to the
dfviz
dataframe viz plugin in the Intake GUI. It would be possible to make your viz an alternate or replacement (dfviz is functional, but very young), as something that can live within the Intake GUI or as an output ofsource.plot
Hi @martindurant, thanks for the suggestions. Actually I was thinking of a bi-directional integration of intake while reading the documentation. Basically there will be a shim to let DataPrep.connector read data from intake and also let intake read data from connector.
I think that's what I meant by referring to using your interactive features with an Intake dataset :) I'm not sure whether the shim would need to be in connector, since Intake will provide you with pandas/dask dataframes already.
We have an intake community meeting on the first Thursday of each month, if anyone here would like to drop by https://github.com/intake/intake/issues/472
I think that's what I meant by referring to using your interactive features with an Intake dataset :) I'm not sure whether the shim would need to be in connector, since Intake will provide you with pandas/dask dataframes already.
Currently, DataPrep only supports sending restful API requests to a URL endpoint. So I think there should be a shim to enable Connector to have Intake as the data source. On the other hand, I think we can also provide an intake plugin, to loading data from DataPrep.Connector.
We have an intake community meeting on the first Thursday of each month, if anyone here would like to drop by intake/intake#472
Thanks for the invitation! I personally will join the meeting and other team members may also join too.