turbodbc icon indicating copy to clipboard operation
turbodbc copied to clipboard

Installation does not recognize pyarrow using pip (windows)

Open danielurencio opened this issue 6 years ago • 17 comments
trafficstars

I get this error with version 3.0.0 even when installing pyarrow first.

This installation of turbodbc does not support Apache Arrow extensions. Please install the pyarrow package. If you have built turbodbc from source, you may also need to reinstall turbodbc to compile the extensions.

danielurencio avatar Mar 11 '19 19:03 danielurencio

Hi Daniel! Which operating system are you using?

MathMagique avatar Mar 12 '19 08:03 MathMagique

Also, did you install using pip or conda?

xhochy avatar Mar 12 '19 08:03 xhochy

Thanks for the response. I'm on windows 10 by the way. I am using pip, I can't create conda environments using turbodbc after succesfully running: conda install turbodbc

The error I get is this:

PackagesNotFoundError: The following packages are not available from current channels:
  - turbodbc

So, I can't use neither :(

danielurencio avatar Mar 12 '19 17:03 danielurencio

You can use turbodbc with conda when you create a conda environment with only packages from conda-forge, i.e.: conda create -n turbodbc-env turbodbc pyarrow python=3.7.

For turbodbc Arrow support on Windows, you need to install turbodbc 3.1.0, it is not available with 3.0.0

xhochy avatar Mar 12 '19 17:03 xhochy

You can use turbodbc with conda when you create a conda environment with only packages from conda-forge, i.e.: conda create -n turbodbc-env turbodbc pyarrow python=3.7.

That is what raises:

PackagesNotFoundError: The following packages are not available from current channels:
  - turbodbc

For turbodbc Arrow support on Windows, you need to install turbodbc 3.1.0, it is not available with 3.0.0

Interesting that you brought that up. When I pip3 install turbodbc=3.1.0 I get something like:

    src\cpp_odbc\level2\level1_connector.cpp(17): fatal error C1083: ...: 'boost/locale.hpp': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

The boost library has been successfully complied and installed globally as instructed here.

This is, however, an entirely different error for which I was going to open a new issue. I can do that if you find it convenient.

danielurencio avatar Mar 12 '19 18:03 danielurencio

As @xhochy mentioned, turbodbc is available from the conda-forge channel:

(base) C:\> conda search 'turbodbc>=3.1'
Loading channels: done
# Name                       Version           Build  Channel
turbodbc                       3.1.0  py36h51f579c_0  conda-forge
turbodbc                       3.1.0  py37h51f579c_0  conda-forge

The error message implies that you haven't got conda-forge as a specified channel in your ~\.condarc config file.

You can add the conda-forge channel manually on the command line by specifying -c conda-forge - e.g.

conda create -n turbodbc-env -c conda-forge turbodbc pyarrow python=3.7

As an aside, to make it easier to help you, you shouldn't just paste the final exception but should include the command you ran, the unabridged output of that command and the full traceback as that provides the context necessary to understand what the problem might be without resorting to guessing what

That is what raises

might mean.

Also, you should never mix pip and conda packages, particularly for packages such as turbodbc with complex, compiled dependencies.

dhirschfeld avatar Mar 12 '19 23:03 dhirschfeld

@dhirschfeld thanks for the input.

Also, you should never mix pip and conda packages, particularly for packages such as turbodbc with complex, compiled dependencies.

Agreed. It is not the case however; both attemps were running in different VMs (anaconda and pip). I tried anaconda out of curiosity after being asked.

As an aside, to make it easier to help you, you shouldn't just paste the final exception but should include the command you ran, the unabridged output of that command and the full traceback as that provides the context necessary to understand what the problem might be without resorting to guessing...

Thanks for pointing that out. In fact, my mistake in the first place is not having specified exactly how I was using this library. I have already edited the title of this issue to make it all clearer. Given that you should never mix pip and conda, I was considering using turbodbc for ETL processes in a production environment where conda is not used whatsoever. The reason for opening this issue was that I installed the library as shown here; I don't think I should give more details about the problems encountered running an anaconda environment if that is not the issue in the first place.

Despite the 'getting started' specs from the docs (where pip install is the way to get started), if this library should be used with conda instead of pip then this thread should not be considered as an issue after all. Otherwise, any help would be greatly appreciated.

In short, the problem is that installing both pyarrow and turbodbc, in that order, doesn't guarantee that turbodbc will be compatible with pyarrow. Hope everything is clearer now and an apology for the confusion.

danielurencio avatar Mar 13 '19 00:03 danielurencio

Is this an issue of installing from different conda channels? My pyarrow comes from conda default and turbodbc from conda-forge. It seems turbodbc does not see pyarrow even if pyarrow is working fine for other use cases (e.g. pyspark, pandas, etc...)

andreapiso avatar Oct 09 '19 08:10 andreapiso

Is this an issue of installing from different conda channels? My pyarrow comes from conda default and turbodbc from conda-forge. It seems turbodbc does not see pyarrow even if pyarrow is working fine for other use cases (e.g. pyspark, pandas, etc...)

No this issue here is using pip not conda. But in general don't mix and match packages from defaults and conda-forge. Either use all from default or all from conda-forge. Both package repositories are built such that they work only well when all packages come from a single source.

xhochy avatar Oct 09 '19 08:10 xhochy

This is quite pity that not possible to use turbodbc with pyarrow from pip. Since our company has both libs in local pip repository after firewall, but not in conda :(

4mitch avatar Oct 17 '19 11:10 4mitch

This is quite pity that not possible to use turbodbc with pyarrow from pip. Since our company has both libs in local pip repository after firewall, but not in conda :(

It's possible but quite fiddly as you need to make sure that pyarrow is installed before calling pip install turbodbc. In conda it's just straightforward working.

xhochy avatar Oct 17 '19 11:10 xhochy

The CI builds of turbodbc actually used a pip-based installation of pyarrow in the past: https://github.com/blue-yonder/turbodbc/blob/11dfd7006a5694ce0c03a9327fa1fd1f408bfa67/.travis.yml

It is definitely possible. You just need to properly install pyarrow beforehand and make sure that you have all the necessary OS-level packages installed, see https://github.com/blue-yonder/turbodbc/blob/11dfd7006a5694ce0c03a9327fa1fd1f408bfa67/.travis.yml.

MathMagique avatar Oct 18 '19 09:10 MathMagique

A workaround for this issue would be to add a pyproject.toml and enforce PEP517 builds where pyarrow is a strict requirement. Then the wheel builds would always work with pyarrow while still being an optional install requirement.

xhochy avatar Oct 18 '19 10:10 xhochy

I am suffering from the same error. Not on windows but Ubuntu, even though the logs show that the wheels for pyarrow was installed. Turbodbc is not able to detect them and it fires the same error as the creator of this issue.

felipebormann avatar Jun 29 '20 19:06 felipebormann

Same experience as @felipebormann here.

spitz-dan-l avatar Jul 15 '20 17:07 spitz-dan-l

Same issue here. If this isn't going to work with pip, maybe it shouldn't be on pypi...

bsplosion avatar Nov 04 '20 14:11 bsplosion

I am unable to install turbodbc[arrow]. I have installed the libraries in this order numpy--> pyarrow -->turbodbc. It fails with the below error

2588#13 214.2 /usr/bin/ld: cannot find -larrow 
2589#13 214.2 /usr/bin/ld: cannot find -larrow_python 
2590#13 214.2 collect2: error: ld returned 1 exit status 
2591#13 214.2 error: command 'g++' failed with exit status 1 
2592#13 214.2 ---------------------------------------- 
2593#13 214.2 ERROR: Command errored out with exit status 1:

I have installed all the libs mentioned in the docs on top of my base image python:3.7-slim like this

RUN apt-get update && apt-get install --assume-yes \
  build-essential \
  gcc \
  g++ \
  libboost-all-dev \
  libpq-dev \
  unixodbc \
  unixodbc-dev \
  python3.7-dev \
  curl \
  perl \
  && rm -rf /var/lib/apt/lists/* \
  && rm -rf /var/cache/apt/*

amritha-jayadev-by avatar Jun 17 '21 15:06 amritha-jayadev-by