shiv icon indicating copy to clipboard operation
shiv copied to clipboard

.shiv artifact filenames too long for Windows

Open Ninjef opened this issue 3 years ago • 3 comments

I'm using shiv for a project that unfortunately will mostly run on Windows. Without long paths enabled in Windows, the path names of the .shiv cached artifacts are at least 64 characters long as they're obviously a 64 char hash appended to a name. That's a huge chunk of the default allowable path length in Windows. On top of that, these files house deep paths related to every site-package they contain. I've often run into stack traces like the following:

 File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\papermill\__init__.py", line 4, in <module>
    from .execute import execute_notebook
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\papermill\execute.py", line 10, in <module>
    from .engines import papermill_engines
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\papermill\engines.py", line 12, in <module>
    from .clientwrap import PapermillNotebookClient
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\papermill\clientwrap.py", line 4, in <module>
    from nbclient import NotebookClient
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\nbclient\__init__.py", line 3, in <module>
    from .client import NotebookClient, execute  # noqa: F401
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\nbclient\client.py", line 21, in <module>
    from jupyter_client import KernelManager
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\jupyter_client\__init__.py", line 4, in <module>
    from .connect import *
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\jupyter_client\connect.py", line 21, in <module>
    import zmq
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\zmq\__init__.py", line 125, in <module>
    from zmq import backend
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\zmq\backend\__init__.py", line 32, in <module>
    raise original_error from None
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\zmq\backend\__init__.py", line 27, in <module>
    _ns = select_backend(first)
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\zmq\backend\select.py", line 32, in select_backend
    mod = import_module(name)
  File "C:\Users\jarnold\AppData\Roaming\Alteryx\Tools\Moonbuggy_venv\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\jarnold\AppData\Local\Temp\pytest-of-jarnold\pytest-38\test_it_will_save_shiv_cache_i0\mbShiv\5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602\site-packages\zmq\backend\cython\__init__.py", line 6, in <module>
    from . import (
ImportError: DLL load failed while importing _proxy_steerable: The filename or extension is too long.

Worse, I used to hit a much more obfuscated issue which turned out to be caused by excessive path length. I believe it was during zip app runtime, and I would see an error about a file not existing. Tracking down the root cause was a fun time lol. I resolved that issue by making my zip app names much shorter. But I worry that certain site-packages will push things over the edge for some users and generate headache-inducing errors.

This issue is resolved on a per-user basis by modifying a registry key in Windows which enables long paths. However, this requires manual intervention which I can't expect every user of my software to perform, and makes my product a bit less easy to use.

Is there any way y'all could tighten these folder name lengths up? What about something like this: Directory tree:

-- .shiv
--- 1
---- site-packages
--- 2
---- site-packages
--- 5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602.cache_1.txt
--- 82aae3216cffb1a_17811ffc937b3fee3e116c8fe09cbc8c78be00c853b8d64b7ac340063ecb2700.cache_2.txt

And when you need to look up a site-packages cache, you look up the name_hash, which will exist as a text file name (ex: 5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602.cache_1.txt), and then parse out the number after .cache_ to find the folder in which this cache exists (in this example, 1). Then look that folder up to find the site-packages for environment build 5cbde5206cff8b2_f031f9bb937b16a73ecc6c0fe90cbc8c78be00c853b8d64b7ac271d44ecb2602 (in this example, folder "1").

Not sure how many people share this problem. But it seems to be a pretty big gotcha for making anything production worthy for Windows users.

Hopefully there's already some workaround. If so please hit me with it!

Thanks for your time.

Ninjef avatar May 18 '21 23:05 Ninjef

Hi @Ninjef,

Thanks for your detailed report, sorry that it's taken me this long to acknowledge 😥

I really like your proposal, we actually drop lock files into ~/.shiv with the same 64 char hash that we could use to record the mapping of build_id hash -> friendly dir name. The only trouble here is that the reason for the hashed directories is to ensure uniqueness within the shiv cache dir (~/.shiv). Imagine if you had two versions of the same CLI using the same friendly directory name, the newer CLI would find the previously extracted site-packages directory and reuse it, likely resulting in a confusing end-user experience where someone thinks they have the newer version of a CLI but it's actually using the older versions code. I've previously outlined the problem here.

That said - your proposal of having a reference file would give us an opportunity to predict the above scenario and either error out or otherwise handle it (for example, if there was a collision we could simply force extraction to overwrite the stale cache). I'm happy to try this idea, but I'm not sure when I'll be able to devote some time to it. Will try to carve some out, as it would solve a number of active issues on the tracker! Thanks again!

lorencarvalho avatar Oct 12 '21 14:10 lorencarvalho

Hi again @Ninjef,

Sorry for the long time to reply, while I haven't been able to play around with a reference file / local database solution, I wanted to mention that the 1.0.0 release of shiv may have a mitigation for this issue in that you can provide your own build-id when you create the pyz: https://shiv.readthedocs.io/en/latest/cli-reference.html#cmdoption-shiv-build-id

The only caveat here is that the ID needs to be unique, so keep that in mind. But you could use this feature to reduce the length of the hash from 64 chars down to something more manageable.

lorencarvalho avatar Dec 20 '21 16:12 lorencarvalho

Excellent! Thanks @lorencarvalho, I may get the chance to try that out sometime soon. It sounds pretty promising.

Ninjef avatar Dec 28 '21 15:12 Ninjef