turbodbc
turbodbc copied to clipboard
Installation does not recognize pyarrow using pip (windows)
I get this error with version 3.0.0 even when installing pyarrow first.
This installation of turbodbc does not support Apache Arrow extensions. Please install the pyarrow package. If you have built turbodbc from source, you may also need to reinstall turbodbc to compile the extensions.
Hi Daniel! Which operating system are you using?
Also, did you install using pip or conda?
Thanks for the response. I'm on windows 10 by the way. I am using pip, I can't create conda environments using turbodbc after succesfully running:
conda install turbodbc
The error I get is this:
PackagesNotFoundError: The following packages are not available from current channels:
- turbodbc
So, I can't use neither :(
You can use turbodbc with conda when you create a conda environment with only packages from conda-forge, i.e.: conda create -n turbodbc-env turbodbc pyarrow python=3.7.
For turbodbc Arrow support on Windows, you need to install turbodbc 3.1.0, it is not available with 3.0.0
You can use turbodbc with conda when you create a conda environment with only packages from conda-forge, i.e.: conda create -n turbodbc-env turbodbc pyarrow python=3.7.
That is what raises:
PackagesNotFoundError: The following packages are not available from current channels:
- turbodbc
For turbodbc Arrow support on Windows, you need to install turbodbc 3.1.0, it is not available with 3.0.0
Interesting that you brought that up. When I pip3 install turbodbc=3.1.0 I get something like:
src\cpp_odbc\level2\level1_connector.cpp(17): fatal error C1083: ...: 'boost/locale.hpp': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
The boost library has been successfully complied and installed globally as instructed here.
This is, however, an entirely different error for which I was going to open a new issue. I can do that if you find it convenient.
As @xhochy mentioned, turbodbc is available from the conda-forge channel:
(base) C:\> conda search 'turbodbc>=3.1'
Loading channels: done
# Name Version Build Channel
turbodbc 3.1.0 py36h51f579c_0 conda-forge
turbodbc 3.1.0 py37h51f579c_0 conda-forge
The error message implies that you haven't got conda-forge as a specified channel in your ~\.condarc config file.
You can add the conda-forge channel manually on the command line by specifying -c conda-forge - e.g.
conda create -n turbodbc-env -c conda-forge turbodbc pyarrow python=3.7
As an aside, to make it easier to help you, you shouldn't just paste the final exception but should include the command you ran, the unabridged output of that command and the full traceback as that provides the context necessary to understand what the problem might be without resorting to guessing what
That is what raises
might mean.
Also, you should never mix pip and conda packages, particularly for packages such as turbodbc with complex, compiled dependencies.
@dhirschfeld thanks for the input.
Also, you should never mix pip and conda packages, particularly for packages such as turbodbc with complex, compiled dependencies.
Agreed. It is not the case however; both attemps were running in different VMs (anaconda and pip). I tried anaconda out of curiosity after being asked.
As an aside, to make it easier to help you, you shouldn't just paste the final exception but should include the command you ran, the unabridged output of that command and the full traceback as that provides the context necessary to understand what the problem might be without resorting to guessing...
Thanks for pointing that out. In fact, my mistake in the first place is not having specified exactly how I was using this library. I have already edited the title of this issue to make it all clearer. Given that you should never mix pip and conda, I was considering using turbodbc for ETL processes in a production environment where conda is not used whatsoever. The reason for opening this issue was that I installed the library as shown here; I don't think I should give more details about the problems encountered running an anaconda environment if that is not the issue in the first place.
Despite the 'getting started' specs from the docs (where pip install is the way to get started), if this library should be used with conda instead of pip then this thread should not be considered as an issue after all. Otherwise, any help would be greatly appreciated.
In short, the problem is that installing both pyarrow and turbodbc, in that order, doesn't guarantee that turbodbc will be compatible with pyarrow. Hope everything is clearer now and an apology for the confusion.
Is this an issue of installing from different conda channels? My pyarrow comes from conda default and turbodbc from conda-forge. It seems turbodbc does not see pyarrow even if pyarrow is working fine for other use cases (e.g. pyspark, pandas, etc...)
Is this an issue of installing from different conda channels? My
pyarrowcomes from conda default and turbodbc fromconda-forge. It seemsturbodbcdoes not seepyarroweven ifpyarrowis working fine for other use cases (e.g. pyspark, pandas, etc...)
No this issue here is using pip not conda. But in general don't mix and match packages from defaults and conda-forge. Either use all from default or all from conda-forge. Both package repositories are built such that they work only well when all packages come from a single source.
This is quite pity that not possible to use turbodbc with pyarrow from pip. Since our company has both libs in local pip repository after firewall, but not in conda :(
This is quite pity that not possible to use turbodbc with pyarrow from pip. Since our company has both libs in local pip repository after firewall, but not in conda :(
It's possible but quite fiddly as you need to make sure that pyarrow is installed before calling pip install turbodbc. In conda it's just straightforward working.
The CI builds of turbodbc actually used a pip-based installation of pyarrow in the past: https://github.com/blue-yonder/turbodbc/blob/11dfd7006a5694ce0c03a9327fa1fd1f408bfa67/.travis.yml
It is definitely possible. You just need to properly install pyarrow beforehand and make sure that you have all the necessary OS-level packages installed, see https://github.com/blue-yonder/turbodbc/blob/11dfd7006a5694ce0c03a9327fa1fd1f408bfa67/.travis.yml.
A workaround for this issue would be to add a pyproject.toml and enforce PEP517 builds where pyarrow is a strict requirement. Then the wheel builds would always work with pyarrow while still being an optional install requirement.
I am suffering from the same error. Not on windows but Ubuntu, even though the logs show that the wheels for pyarrow was installed. Turbodbc is not able to detect them and it fires the same error as the creator of this issue.
Same experience as @felipebormann here.
Same issue here. If this isn't going to work with pip, maybe it shouldn't be on pypi...
I am unable to install turbodbc[arrow]. I have installed the libraries in this order numpy--> pyarrow -->turbodbc. It fails with the below error
2588#13 214.2 /usr/bin/ld: cannot find -larrow
2589#13 214.2 /usr/bin/ld: cannot find -larrow_python
2590#13 214.2 collect2: error: ld returned 1 exit status
2591#13 214.2 error: command 'g++' failed with exit status 1
2592#13 214.2 ----------------------------------------
2593#13 214.2 ERROR: Command errored out with exit status 1:
I have installed all the libs mentioned in the docs on top of my base image python:3.7-slim like this
RUN apt-get update && apt-get install --assume-yes \
build-essential \
gcc \
g++ \
libboost-all-dev \
libpq-dev \
unixodbc \
unixodbc-dev \
python3.7-dev \
curl \
perl \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /var/cache/apt/*