datafusion
datafusion copied to clipboard
datafusion-cli not installed
Describe the bug
https://arrow.apache.org/datafusion/user-guide/cli.html the CLI cannot run
datafusion-cli
-bash: datafusion-cli: command not found
To Reproduce
pip install datafusion
datafusion-cli
Expected behavior
the CLI runs
Additional context
No response
python module works
python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datafusion
>>> datafusion.__version__
'35.0.0'
Hmm, I think this issue may move to https://github.com/apache/arrow-datafusion-python?
thanks. I aslo post it there but I also saw the source code in https://github.com/apache/arrow-datafusion/tree/main/datafusion-cli is not in https://github.com/apache/arrow-datafusion-python
datafusion-cli is in this repo. I meant its pypi packaging and release should be done at https://github.com/apache/arrow-datafusion-python.
I think the issue is not datafusion-cli itself or its functionality but the pypi package issue.
Anyway, it is okay to keep it open to get visibility. :)
I think there is a brew package if you are on Mac:
brew install datafusion
Actually, does the datafusion pypi package even include datafusion-cli in the first place? :thinking:
I did a quick search through https://github.com/apache/arrow-datafusion-python and found no actual mention of datafusion-cli (though in fairness I'm not familiar with that repo or the process of packaging a Python package)
Maybe the documentation on that CLI user guide page is mistaken on that account? Relevant PR: https://github.com/apache/arrow-datafusion/pull/8389
cc @Weijun-H
Actually, does the datafusion pypi package even include datafusion-cli in the first place? 🤔
I did a quick search through apache/arrow-datafusion-python and found no actual mention of datafusion-cli (though in fairness I'm not familiar with that repo or the process of packaging a Python package)
Maybe the documentation on that CLI user guide page is mistaken on that account? Relevant PR: #8389
cc @Weijun-H
After checking the documentation in apache/arrow-datafusion-python, I discovered that the current PyPi installation for CLI is incorrect @Jefffrey . Perhaps it's time to implement pip install datafusion-cli
🤔 ?
Perhaps it's time to implement pip install datafusion-cli
I am not sure how DataFusion release procedure works, but if you want to automate it in CI, maturin can help.
I have done this to
Topgrade
, take a look at this PR if you want to see a real-world example on how it looks like.
pip install datafusion-cli
works now, thanks.
Hey @l1t1 the current datafusion-cli on PyPI is meant to be a test, it's not automated for future releases as the PR is not yet merged. I'd appreciate if you can re-open the issue to be closed after the PR is merged.
I left a comment on https://github.com/apache/arrow-datafusion/pull/9452#issuecomment-2027555738
DataFusion is a Rust project and datafusion-cli is already available via cargo, which is the default packaging manager for Rust. If we want to use Python packaging for datafusion-cli, it seems logical to do that in the DataFusion Python repository.
version 37.1.0 still has the issue
D:\>pip install datafusion -U
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: datafusion in d:\python38\lib\site-packages (36.0.0)
Collecting datafusion
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/90/7e/09877d816952ff90f2bdcd49c45b199e20b226708068fa6a5bfb7d8ed51a/datafusion-37.1.0-cp38-abi3-win_amd64.whl (16.8 MB)
---------------------------------------- 16.8/16.8 MB 40.9 MB/s eta 0:00:00
Requirement already satisfied: pyarrow>=11.0.0 in d:\python38\lib\site-packages (from datafusion) (15.0.0)
Requirement already satisfied: numpy<2,>=1.16.6 in d:\python38\lib\site-packages (from pyarrow>=11.0.0->datafusion) (1.21.0)
Installing collected packages: datafusion
Attempting uninstall: datafusion
Found existing installation: datafusion 36.0.0
Uninstalling datafusion-36.0.0:
Successfully uninstalled datafusion-36.0.0
Successfully installed datafusion-37.1.0
D:\mathhigh>datafusion-cli
DataFusion CLI v36.0.0
❯
\q
Hey @l1t1, as per Andy's comments on #9452, datafusion-cli releases should be handled in the python repo.