datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

datafusion-cli not installed

Open l1t1 opened this issue 1 year ago • 14 comments

Describe the bug

https://arrow.apache.org/datafusion/user-guide/cli.html the CLI cannot run

datafusion-cli
-bash: datafusion-cli: command not found

To Reproduce

pip install datafusion

datafusion-cli

Expected behavior

the CLI runs

Additional context

No response

l1t1 avatar Feb 20 '24 22:02 l1t1

python module works

python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datafusion                                                                          
>>> datafusion.__version__
'35.0.0'

l1t1 avatar Feb 20 '24 22:02 l1t1

Hmm, I think this issue may move to https://github.com/apache/arrow-datafusion-python?

viirya avatar Feb 20 '24 22:02 viirya

thanks. I aslo post it there but I also saw the source code in https://github.com/apache/arrow-datafusion/tree/main/datafusion-cli is not in https://github.com/apache/arrow-datafusion-python

l1t1 avatar Feb 20 '24 23:02 l1t1

datafusion-cli is in this repo. I meant its pypi packaging and release should be done at https://github.com/apache/arrow-datafusion-python.

I think the issue is not datafusion-cli itself or its functionality but the pypi package issue.

viirya avatar Feb 21 '24 00:02 viirya

Anyway, it is okay to keep it open to get visibility. :)

viirya avatar Feb 21 '24 00:02 viirya

I think there is a brew package if you are on Mac:

brew install datafusion

alamb avatar Feb 21 '24 08:02 alamb

Actually, does the datafusion pypi package even include datafusion-cli in the first place? :thinking:

I did a quick search through https://github.com/apache/arrow-datafusion-python and found no actual mention of datafusion-cli (though in fairness I'm not familiar with that repo or the process of packaging a Python package)

Maybe the documentation on that CLI user guide page is mistaken on that account? Relevant PR: https://github.com/apache/arrow-datafusion/pull/8389

cc @Weijun-H

Jefffrey avatar Feb 21 '24 10:02 Jefffrey

Actually, does the datafusion pypi package even include datafusion-cli in the first place? 🤔

I did a quick search through apache/arrow-datafusion-python and found no actual mention of datafusion-cli (though in fairness I'm not familiar with that repo or the process of packaging a Python package)

Maybe the documentation on that CLI user guide page is mistaken on that account? Relevant PR: #8389

cc @Weijun-H

After checking the documentation in apache/arrow-datafusion-python, I discovered that the current PyPi installation for CLI is incorrect @Jefffrey . Perhaps it's time to implement pip install datafusion-cli 🤔 ?

Weijun-H avatar Feb 21 '24 11:02 Weijun-H

Perhaps it's time to implement pip install datafusion-cli

I am not sure how DataFusion release procedure works, but if you want to automate it in CI, maturin can help.

I have done this to Topgrade, take a look at this PR if you want to see a real-world example on how it looks like.

SteveLauC avatar Feb 22 '24 06:02 SteveLauC

pip install datafusion-cli works now, thanks.

l1t1 avatar Mar 08 '24 04:03 l1t1

Hey @l1t1 the current datafusion-cli on PyPI is meant to be a test, it's not automated for future releases as the PR is not yet merged. I'd appreciate if you can re-open the issue to be closed after the PR is merged.

MohamedAbdeen21 avatar Mar 10 '24 09:03 MohamedAbdeen21

I left a comment on https://github.com/apache/arrow-datafusion/pull/9452#issuecomment-2027555738

DataFusion is a Rust project and datafusion-cli is already available via cargo, which is the default packaging manager for Rust. If we want to use Python packaging for datafusion-cli, it seems logical to do that in the DataFusion Python repository.

andygrove avatar Mar 29 '24 18:03 andygrove

version 37.1.0 still has the issue

D:\>pip install datafusion -U
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: datafusion in d:\python38\lib\site-packages (36.0.0)
Collecting datafusion
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/90/7e/09877d816952ff90f2bdcd49c45b199e20b226708068fa6a5bfb7d8ed51a/datafusion-37.1.0-cp38-abi3-win_amd64.whl (16.8 MB)
     ---------------------------------------- 16.8/16.8 MB 40.9 MB/s eta 0:00:00
Requirement already satisfied: pyarrow>=11.0.0 in d:\python38\lib\site-packages (from datafusion) (15.0.0)
Requirement already satisfied: numpy<2,>=1.16.6 in d:\python38\lib\site-packages (from pyarrow>=11.0.0->datafusion) (1.21.0)
Installing collected packages: datafusion
  Attempting uninstall: datafusion
    Found existing installation: datafusion 36.0.0
    Uninstalling datafusion-36.0.0:
      Successfully uninstalled datafusion-36.0.0
Successfully installed datafusion-37.1.0

D:\mathhigh>datafusion-cli
DataFusion CLI v36.0.0
❯
\q

l1t1 avatar May 15 '24 01:05 l1t1

Hey @l1t1, as per Andy's comments on #9452, datafusion-cli releases should be handled in the python repo.

MohamedAbdeen21 avatar May 18 '24 15:05 MohamedAbdeen21