Allow usage of pydataset with no external dependancies
PyDataset is a fantastic tool to learn Python. But requiring pandas (and hence numpy) is a big barrier of entry. What's more you may want to be able to load the data using another tool to process it.
To make your lib more flexible and more newcomer friendly, I'd advice:
- to create a toolbox that let you define the data index and load it in a generic way. It should not rely on a particular tech for downloading or result format and provide hooks to plug your own.
- then build adapters for your downloader and pandas;
- then build an adapter for regular python data structure.
- It should default on pandas if it's installed, or regular python list/dict if it's not.
This will allow:
- beginers to use it without needing to learn or install pandas;
- external tools to embed it and adapt it easily;
- make it easy to adapt to use with other data processing tools.
- make it easy to adapt to use with other way to download data (gevent, asyncio, threadpool, etc).
Pandas layer could be on pandas-ml side. See https://github.com/pandas-ml/pandas-ml/issues/68
@NiklasRosenstein: you can now vote using the smiley icon on the right, don't create a comment just to +1.
Much appreciated @sametmax! But the feature was added after my comment, d'uh.