dfply data still missing after installation

I'm having this same issue still: https://github.com/kieferk/dfply/issues/8

-I am using conda to install dfply (which I need to because that's the package manager used by the computing cluster I have access to).

conda install -c tallic dfply

That's the command I use to install the package from https://anaconda.org/tallic/dfply.

But when I go to use dfply, it still says the diamonds.csv data is missing.

Traceback (most recent call last): File "ACH_nested_anova.py", line 1, in import dfply File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/init.py", line 11, in from .data import diamonds File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/init.py", line 5, in diamonds = pd.read_csv(os.path.join(root, "diamonds.csv")) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv' does not exist: b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv'

2019-03-15 13:25:11 ⌚ gateway-03 in ~/ACH_Development/ACH_tests/ACH_quiz3/python_scripts/Analysis ○ → python ACH_nested_anova.py Traceback (most recent call last): File "ACH_nested_anova.py", line 2, in from dfply import group_by as group_by, summarize as summarize, select as select File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/init.py", line 11, in from .data import diamonds File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/init.py", line 5, in diamonds = pd.read_csv(os.path.join(root, "diamonds.csv")) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv' does not exist: b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv'

2019-03-15 13:25:41 ⌚ gateway-03 in ~/ACH_Development/ACH_tests/ACH_quiz3/python_scripts/Analysis ○ → pip install dfply Requirement already satisfied: dfply in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (0.3.1) Requirement already satisfied: numpy in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from dfply) (1.16.2) Requirement already satisfied: pandas in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from dfply) (0.24.2) Requirement already satisfied: python-dateutil>=2.5.0 in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from pandas->dfply) (2.8.0) Requirement already satisfied: pytz>=2011k in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from pandas->dfply) (2018.9) Requirement already satisfied: six>=1.5 in /mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages (from python-dateutil>=2.5.0->pandas->dfply) (1.12.0)

2019-03-15 13:26:59 ⌚ gateway-03 in ~/ACH_Development/ACH_tests/ACH_quiz3/python_scripts/Analysis ○ → python ACH_nested_anova.py Traceback (most recent call last): File "ACH_nested_anova.py", line 2, in from dfply import group_by as group_by, summarize as summarize, select as select File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/init.py", line 11, in from .data import diamonds File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/init.py", line 5, in diamonds = pd.read_csv(os.path.join(root, "diamonds.csv")) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv' does not exist: b'/mnt/home/bundyjas/anaconda3/envs/ACH_environment/lib/python3.6/site-packages/dfply/data/diamonds.csv'

I can substitute the import line with any of the following and the result is still the same: -import dfply -from dfply import group_by as group_by, summarize as summarize, select as select -from dfply import *

Please help. I cannot seem to use git or pip to correct the problem. Pip tells me the package is already installed, but I get the same problem. Git is not available to me.

Mar 15 '19 17:03 jaybundy

I'm not 100% sure, but I guess this issue comes from building from source. If you pip download --no-binary :all: --no-dependencies dfply you'll find the same issue, the diamonds.csv file is missing from dfply/data/ folder. However downloading the wheel pip download --no-dependencies dfply, if you inspect the wheel you'll find that the diamonds.csv file is there.

I don't know anything about conda package management, but perhaps they take the result of python setup.py sdist, which would omit the data file. According to this random SO post, a MANIFEST file should fix things.

https://stackoverflow.com/questions/7522250/how-to-include-package-data-with-setuptools-distribute

Sep 19 '19 23:09 andrewkho

I ran into the same error with file missing of diamonds when dfply library was installed using conda (conda install -c tallic dfply). In order to resolve this remove library and its dependencies using conda.
Then install using pip under same conda environment.

library installed with pip install works

Mar 06 '20 21:03 complexgenome

dfply dfply copied to clipboard

data still missing after installation

dfply
dfply copied to clipboard