PyDataset Allow usage of pydataset with no external dependancies

PyDataset is a fantastic tool to learn Python. But requiring pandas (and hence numpy) is a big barrier of entry. What's more you may want to be able to load the data using another tool to process it.

To make your lib more flexible and more newcomer friendly, I'd advice:

to create a toolbox that let you define the data index and load it in a generic way. It should not rely on a particular tech for downloading or result format and provide hooks to plug your own.
then build adapters for your downloader and pandas;
then build an adapter for regular python data structure.
It should default on pandas if it's installed, or regular python list/dict if it's not.

This will allow:

beginers to use it without needing to learn or install pandas;
external tools to embed it and adapt it easily;
make it easy to adapt to use with other data processing tools.
make it easy to adapt to use with other way to download data (gevent, asyncio, threadpool, etc).

Feb 03 '16 10:02 sametmax

Feb 10 '16 11:02 NiklasRosenstein

Pandas layer could be on pandas-ml side. See https://github.com/pandas-ml/pandas-ml/issues/68

Feb 21 '16 08:02 s-celles

@NiklasRosenstein: you can now vote using the smiley icon on the right, don't create a comment just to +1.

Mar 25 '16 07:03 sametmax

Much appreciated @sametmax! But the feature was added after my comment, d'uh.

Mar 25 '16 08:03 NiklasRosenstein