arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Python] create_library_symlinks() does not fix broken symlinks

Open robomics opened this issue 1 year ago • 2 comments

Describe the enhancement requested

Consider the following scenario:

# Create a venv and install some version of pyarrow
python3 -m venv /tmp/venv
/tmp/venv/bin/pip install pyarrow==16

# Create symlinks to versioned libs
/tmp/venv/bin/python -c 'import pyarrow; pyarrow.create_library_symlinks()'
ls -lah /tmp/venv/lib/python*/site-packages/pyarrow/*.so

# The above should list several symlinks, such as
# /tmp/venv/lib/python3.12/site-packages/pyarrow/libarrow.so -> /tmp/venv/lib/python3.12/site-packages/pyarrow/libarrow.so.1600

# Update pyarrow
/tmp/venv/bin/pip install pyarrow==17

# Try to update the symlinks
/tmp/venv/bin/python -c 'import pyarrow; pyarrow.create_library_symlinks()'

# This will result in an error, as the symlinks already exists (even though they are broken due to the update)

I think it would be more appropriate for pyarrow.create_library_symlinks() to detect and fix broken symlinks.

Component(s)

Python

robomics avatar Sep 25 '24 17:09 robomics

Thanks for filing this @robomics. I'm not familiar with the create_library_symlinks function yet but my initial take is that we might be better off giving the user a clear warning with steps they can run manually rather than remove the symlinks for them. My main reasoning is that I think it's best to (1) detect unusual states and let the user know rather than work around it and (2) not delete anything on the user's behalf unless the user opts in.

I also note that the docstring currently says the function should only be run once whereas you've run it twice here:

This function must only be invoked once and only when the shared libraries are bundled with the Python package,...

What do you think about just improving the error instead of removing the symlinks? Feel free to add any more context from your use case if you feel it'd be useful here.

amoeba avatar Oct 01 '24 05:10 amoeba

Hi @amoeba, apologies for not coming back to you earlier.

I understand your concerns regarding deleting things on behalf of users without their explicit consent. That being said, I still believe it would be useful to provide an automated way for users to fix their python environment when the environment is broken due to updating pyarrow.

These are my reasons:

  • While it is true that the docs state that create_library_symlinks() should be run only once, they do not make it clear that installing pyarrow -> create_library_symlinks() -> upgrading pyarrow will lead to broken setups on *NIX systems.
  • Manually fixing broken environments cannot be done easily (with e.g. rm /tmp/venv/lib/python3.*/site-packages/pyarrow/libarrow*.so), because that would delete some shared libs that are not symlinks (e.g. libarrow_python.so). The simplest way I can think of to safely remove dangling symlinks with shell commands is something like find "$(python -c 'import os, pyarrow; print(os.path.dirname(pyarrow.__file__), end="")')" -type l -delete. This works fine, but I am not sure it is a good recommendation to give to every user.
  • Manually uninstalling and re-installing pyarrow with pip will not fix the broken links (because pip refuses to delete files that it has not created).
  • When linking against e.g. libarrow shipped with the wheels, dangling links can lead to misleading errors. When I stumbled on this issue for the first time I was relying on CMake's find_library() to get the path to the wheels' libarrow (see here for more context). find_library() was failing claiming that libarrow.so didn't exist, even though a file with that name indeed existed on my machine. My first reaction was deleting and re-creating my dev venv, which solved the problem for a few hours. Then something upgraded the pyarrow version installed in my venv and I was back at square zero. Figuring out that the problem were a few dangling symlinks took more time than I'd like to admit :D.

How about we change create_library_symlinks() signature to something like:

def create_library_symlinks(repair_broken_symlinks: bool = False):
    # impl ...

and then we update the implementation to unlink dangling symlinks only when repair_broken_symlinks=True.

It may also be worth updating the docs explaining that up/downgrading pyarrow after calling create_library_symlinks() leads to broken symlinks that can cause all sorts of problems, and that this can be fixed by calling create_library_symlinks(repair_broken_symlinks=True) (or by manually uninstalling pyarrow, deleting its package folder, reinstalling pyarrow, and finally calling create_library_symlinks()).

robomics avatar Oct 19 '24 16:10 robomics

This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.

thisisnic avatar Nov 18 '25 12:11 thisisnic