dvc icon indicating copy to clipboard operation
dvc copied to clipboard

import: fails with ModuleNotFoundError (win32timezone) when importing private repository using ssh using dvc from Windows Installer

Open W1M0R opened this issue 3 years ago • 2 comments

Bug Report

Description

According to the guide, we need to use the git/ssh protocol when working with private repositories. Following this recommendation, a dvc import fails with ModuleNotFoundError: No module named 'win32timezone'. Trying again, but this time using https, I get dulwich.client.HTTPUnauthorized: No valid credentials provided (as expected according to your guide).

Reproduce

  1. Install dvc 2.10.1 using the Windows Installer.
  2. Clone a dvc project.
  3. Run dvc import --verbose [email protected]:YourOrg/YourPrivateRepo.git AFolder/ADvcFolder -o AFolder/ADvcFolder

YourOrg/YourPrivateRepo.git - This is a private repo initialised with dvc. ADvcFolder - Is a dvc tracked folder (e.g. via dvc add AFolder/ADvcFolder in the private repo)

Expected

Import should succeed without error.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.10.1 (exe)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19044-SP0
Supports:
        azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.9.0),
        gdrive (pydrive2 = 1.10.0),
        gs (gcsfs = 2022.3.0),
        hdfs (fsspec = 2022.3.0, pyarrow = 7.0.0),
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
        ssh (sshfs = 2022.3.1),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.5),
        webdavs (webdav4 = 0.9.5)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: local
Workspace directory: NTFS on D:\
Repo: dvc, git

Output of dvc import:

$ dvc import --verbose [email protected]:YourOrg/YourPrivateRepo.git AFolder/ADvcFolder -o AFolder/ADvcFolder
2022-04-15 23:40:52,341 DEBUG: Removing output 'AFolder\ADvcFolder' of stage: 'AFolder\ADvcFolder.dvc'.
2022-04-15 23:40:52,343 DEBUG: Removing 'D:\TestProject\AFolder\ADvcFolder'
Importing 'AFolder/ADvcFolder ([email protected]:YourOrg/YourPrivateRepo.git)' -> 'AFolder\ADvcFolder'
2022-04-15 23:40:52,352 DEBUG: Computed stage: 'AFolder\ADvcFolder.dvc' md5: '0f2babe69537617c1ffa8526e9d72c0a'
2022-04-15 23:40:52,353 DEBUG: 'md5' of stage: 'AFolder\ADvcFolder.dvc' changed.
2022-04-15 23:40:52,355 DEBUG: Creating external repo [email protected]:YourOrg/YourPrivateRepo.git@None
2022-04-15 23:40:52,356 DEBUG: erepo: git clone '[email protected]:YourOrg/YourPrivateRepo.git' to a temporary dir
2022-04-15 23:40:53,433 ERROR: failed to import 'AFolder/ADvcFolder from '[email protected]:YourOrg/YourPrivateRepo.git'. - Failed to clone repo '[email protected]:YourOrg/YourPrivateRepo.git' to 'C:\Users\user\AppData\Local\Temp\tmpbeadgsg7dvc-clone'
------------------------------------------------------------
Traceback (most recent call last):
  File "scmrepo\git\backend\dulwich\__init__.py", line 193, in clone
  File "dulwich\porcelain.py", line 443, in clone
  File "dulwich\client.py", line 535, in clone
  File "dulwich\client.py", line 601, in fetch
  File "dulwich\client.py", line 1088, in fetch_pack
  File "dulwich\client.py", line 1756, in _connect
  File "fsspec\asyn.py", line 85, in wrapper
  File "fsspec\asyn.py", line 65, in sync
  File "fsspec\asyn.py", line 25, in _runner
  File "scmrepo\git\backend\dulwich\asyncssh_vendor.py", line 149, in _run_command
  File "asyncssh\connection.py", line 7687, in connect
  File "asyncio\tasks.py", line 455, in wait_for
  File "asyncssh\connection.py", line 429, in _connect
  File "asyncio\base_events.py", line 1050, in create_connection
  File "asyncio\base_events.py", line 1068, in _create_connection_transport
  File "asyncssh\connection.py", line 7678, in conn_factory
  File "asyncssh\connection.py", line 3064, in __init__
  File "asyncssh\gss_win32.py", line 168, in __init__
  File "sspi.py", line 200, in __init__
ModuleNotFoundError: No module named 'win32timezone'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "dvc\scm.py", line 106, in clone
  File "scmrepo\git\__init__.py", line 143, in clone
  File "scmrepo\git\backend\dulwich\__init__.py", line 196, in clone
scmrepo.exceptions.CloneError: Failed to clone repo '[email protected]:YourOrg/YourPrivateRepo.git' to 'C:\Users\user\AppData\Local\Temp\tmpbeadgsg7dvc-clone'     

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dvc\commands\imp.py", line 15, in run
  File "dvc\repo\imp.py", line 6, in imp
  File "dvc\repo\__init__.py", line 48, in wrapper
  File "dvc\repo\scm_context.py", line 152, in run
  File "dvc\repo\imp_url.py", line 83, in imp_url
  File "funcy\decorators.py", line 45, in wrapper
  File "dvc\stage\decorators.py", line 36, in rwlocked
  File "funcy\decorators.py", line 66, in __call__
  File "dvc\stage\__init__.py", line 533, in run
  File "funcy\decorators.py", line 45, in wrapper
  File "dvc\stage\decorators.py", line 36, in rwlocked
  File "funcy\decorators.py", line 66, in __call__
  File "dvc\stage\__init__.py", line 557, in _sync_import
  File "dvc\stage\imports.py", line 47, in sync_import
  File "dvc\dependency\repo.py", line 66, in download
  File "dvc\dependency\repo.py", line 107, in _get_used_and_obj
  File "contextlib.py", line 113, in __enter__
  File "dvc\external_repo.py", line 36, in external_repo
  File "dvc\external_repo.py", line 162, in _cached_clone
  File "funcy\decorators.py", line 45, in wrapper
  File "funcy\flow.py", line 274, in wrap_with
  File "funcy\decorators.py", line 66, in __call__
  File "dvc\external_repo.py", line 232, in _clone_default_branch
  File "dvc\scm.py", line 108, in clone
dvc.scm.CloneError: Failed to clone repo '[email protected]:YourOrg/YourPrivateRepo.git' to 'C:\Users\user\AppData\Local\Temp\tmpbeadgsg7dvc-clone'
------------------------------------------------------------
2022-04-15 23:40:53,453 DEBUG: Analytics is enabled.
2022-04-15 23:40:53,456 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', 'C:\\Users\\user\\AppData\\Local\\Temp\\tmpnnqcpnag']'
2022-04-15 23:40:53,461 DEBUG: Spawned '['daemon', '-q', 'analytics', 'C:\\Users\\user\\AppData\\Local\\Temp\\tmpnnqcpnag']'

W1M0R avatar Apr 15 '22 22:04 W1M0R

I uninstalled dvc (which was previously installed using the dvc Windows Installer - to get the automated symlink permissions setup). Then I installed dvc using choco. This resolved the issue. It could be that the problem then lies with the dvc Windows Installer, or that I missed a manual installation step.

Output of dvc doctor:

$ dvc doctor
DVC version: 2.10.1 (choco)
---------------------------------
Platform: Python 3.9.0 on Windows-10-10.0.19041-SP0
Supports:
        azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.9.0),
        gdrive (pydrive2 = 1.10.0),
        gs (gcsfs = 2022.3.0),
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
        ssh (sshfs = 2022.3.1),
        oss (ossfs = 2021.8.0)
Cache types: hardlink
Cache directory: NTFS on D:\
Caches: local
Remotes: local
Workspace directory: NTFS on D:\
Repo: dvc, git

W1M0R avatar Apr 16 '22 17:04 W1M0R

Having the same issue of the missing win32timezone module with Win installer 2.18.0 (both in powershell and git bash). My workaround was installing dvc via pip and a venv. I think more issues like #2754, #7505 and this problem discussion are connected to this. The merged PR #2763 seems to address it but we're still having the win32timezone issue after all?!

jruehle avatar Aug 24 '22 09:08 jruehle

Any idea if this will get attention? Ran into the same issue from the 2.11.0 windows installer (although pip-installing 2.29.0 has failed with different error; had to revert to 2.9.5). It's much more convenient for our team to distribute internally via the binary installer.

Edit: In case anyone else has this issue, the bug is not present in the 2.9.5 windows binary installer.

edwardwbarber avatar Oct 06 '22 20:10 edwardwbarber

@HaddocktheHorrible have you tried the windows installer for the latest DVC release?

pmrowla avatar Oct 11 '22 12:10 pmrowla

Hi @pmrowla - not yet. Since I ran into #7702 from the pip version of 2.29.0, I did not bother with the windows installer. But you're right that I can see if I get the win32timezone error on the latest installer too. I will report back...

Update: Confirmed I see the same ModuleNotFoundError: No module named 'win32timezone' error when trying to call dvc import from the 2.30.0 release version.

Here's the traceback leading to that error:

Traceback (most recent call last):
  File "scmrepo\git\backend\dulwich\__init__.py", line 200, in clone
  File "dulwich\porcelain.py", line 538, in clone
  File "dulwich\client.py", line 760, in clone
  File "dulwich\client.py", line 837, in fetch
  File "dulwich\client.py", line 1146, in fetch_pack
  File "dulwich\client.py", line 1792, in _connect
  File "fsspec\asyn.py", line 111, in wrapper
  File "fsspec\asyn.py", line 96, in sync
  File "fsspec\asyn.py", line 53, in _runner
  File "scmrepo\git\backend\dulwich\asyncssh_vendor.py", line 163, in _run_command
  File "asyncssh\connection.py", line 7834, in connect
  File "asyncio\tasks.py", line 442, in wait_for
  File "asyncssh\connection.py", line 437, in _connect
  File "asyncio\base_events.py", line 1090, in create_connection
  File "asyncio\base_events.py", line 1108, in _create_connection_transport
  File "asyncssh\connection.py", line 7825, in conn_factory
  File "asyncssh\connection.py", line 3097, in __init__
  File "asyncssh\gss_win32.py", line 168, in __init__
  File "sspi.py", line 200, in __init__
ModuleNotFoundError: No module named 'win32timezone'

And dvc doctor output:

$ dvc doctor
DVC version: 2.30.0 (exe)
---------------------------------
Platform: Python 3.9.13 on Windows-10-10.0.19043-SP0
Subprojects:

Supports:
        azure (adlfs = 2022.10.0, knack = 0.10.0, azure-identity = 1.10.0),
        gdrive (pydrive2 = 1.10.0),
        gs (gcsfs = 2022.1.0),
        hdfs (fsspec = 2022.1.0, pyarrow = 7.0.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        oss (ossfs = 2021.8.0),
        s3 (s3fs = 2022.1.0, boto3 = 1.20.24),
        ssh (sshfs = 2021.11.2),
        webdav (webdav4 = 0.9.4),
        webdavs (webdav4 = 0.9.4),
        webhdfs (fsspec = 2022.1.0)
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: s3, s3
Workspace directory: NTFS on C:\
Repo: dvc, git

edwardwbarber avatar Oct 11 '22 21:10 edwardwbarber

Hi @pmrowla - last week I tried with the 2.30.0 but had no success (see logs attached in the previous comment). Sorry if this is a double-ping -- I realized comment edits might not send notifications...

edwardwbarber avatar Oct 18 '22 15:10 edwardwbarber

I'm not able to reproduce, but it looks like we were simply missing a hiddenimport for pyinstaller. Added one in https://github.com/iterative/dvc/commit/ab3f8fb1a4b12ae61f0f04f2821bfdf9ddcb6fd4 @W1M0R @HaddocktheHorrible @jruehle folks, please give 2.35.2 a try (the packages should be out later today/tomorrow) https://dvc.org/download/win/dvc-2.35.2

efiop avatar Nov 24 '22 21:11 efiop

@efiop Sorry for the delay - just had a chance to install and try it. Can confirm dvc import is working again as expected for the 2.35.2 binary install. Thanks for the fix!

edwardwbarber avatar Dec 01 '22 16:12 edwardwbarber

I have same problem in version 2.34.0. but after installed 2.35.2, it works with out probolems. thank you

ho9science avatar Dec 15 '22 05:12 ho9science