dvc
dvc copied to clipboard
import: fails with ModuleNotFoundError (win32timezone) when importing private repository using ssh using dvc from Windows Installer
Bug Report
Description
According to the guide, we need to use the git/ssh protocol when working with private repositories. Following this recommendation, a dvc import fails with ModuleNotFoundError: No module named 'win32timezone'. Trying again, but this time using https, I get dulwich.client.HTTPUnauthorized: No valid credentials provided (as expected according to your guide).
Reproduce
- Install dvc 2.10.1 using the Windows Installer.
- Clone a dvc project.
- Run
dvc import --verbose [email protected]:YourOrg/YourPrivateRepo.git AFolder/ADvcFolder -o AFolder/ADvcFolder
YourOrg/YourPrivateRepo.git - This is a private repo initialised with dvc.
ADvcFolder - Is a dvc tracked folder (e.g. via dvc add AFolder/ADvcFolder in the private repo)
Expected
Import should succeed without error.
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.10.1 (exe)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19044-SP0
Supports:
azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.9.0),
gdrive (pydrive2 = 1.10.0),
gs (gcsfs = 2022.3.0),
hdfs (fsspec = 2022.3.0, pyarrow = 7.0.0),
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
ssh (sshfs = 2022.3.1),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.5),
webdavs (webdav4 = 0.9.5)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: local
Workspace directory: NTFS on D:\
Repo: dvc, git
Output of dvc import:
$ dvc import --verbose [email protected]:YourOrg/YourPrivateRepo.git AFolder/ADvcFolder -o AFolder/ADvcFolder
2022-04-15 23:40:52,341 DEBUG: Removing output 'AFolder\ADvcFolder' of stage: 'AFolder\ADvcFolder.dvc'.
2022-04-15 23:40:52,343 DEBUG: Removing 'D:\TestProject\AFolder\ADvcFolder'
Importing 'AFolder/ADvcFolder ([email protected]:YourOrg/YourPrivateRepo.git)' -> 'AFolder\ADvcFolder'
2022-04-15 23:40:52,352 DEBUG: Computed stage: 'AFolder\ADvcFolder.dvc' md5: '0f2babe69537617c1ffa8526e9d72c0a'
2022-04-15 23:40:52,353 DEBUG: 'md5' of stage: 'AFolder\ADvcFolder.dvc' changed.
2022-04-15 23:40:52,355 DEBUG: Creating external repo [email protected]:YourOrg/YourPrivateRepo.git@None
2022-04-15 23:40:52,356 DEBUG: erepo: git clone '[email protected]:YourOrg/YourPrivateRepo.git' to a temporary dir
2022-04-15 23:40:53,433 ERROR: failed to import 'AFolder/ADvcFolder from '[email protected]:YourOrg/YourPrivateRepo.git'. - Failed to clone repo '[email protected]:YourOrg/YourPrivateRepo.git' to 'C:\Users\user\AppData\Local\Temp\tmpbeadgsg7dvc-clone'
------------------------------------------------------------
Traceback (most recent call last):
File "scmrepo\git\backend\dulwich\__init__.py", line 193, in clone
File "dulwich\porcelain.py", line 443, in clone
File "dulwich\client.py", line 535, in clone
File "dulwich\client.py", line 601, in fetch
File "dulwich\client.py", line 1088, in fetch_pack
File "dulwich\client.py", line 1756, in _connect
File "fsspec\asyn.py", line 85, in wrapper
File "fsspec\asyn.py", line 65, in sync
File "fsspec\asyn.py", line 25, in _runner
File "scmrepo\git\backend\dulwich\asyncssh_vendor.py", line 149, in _run_command
File "asyncssh\connection.py", line 7687, in connect
File "asyncio\tasks.py", line 455, in wait_for
File "asyncssh\connection.py", line 429, in _connect
File "asyncio\base_events.py", line 1050, in create_connection
File "asyncio\base_events.py", line 1068, in _create_connection_transport
File "asyncssh\connection.py", line 7678, in conn_factory
File "asyncssh\connection.py", line 3064, in __init__
File "asyncssh\gss_win32.py", line 168, in __init__
File "sspi.py", line 200, in __init__
ModuleNotFoundError: No module named 'win32timezone'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc\scm.py", line 106, in clone
File "scmrepo\git\__init__.py", line 143, in clone
File "scmrepo\git\backend\dulwich\__init__.py", line 196, in clone
scmrepo.exceptions.CloneError: Failed to clone repo '[email protected]:YourOrg/YourPrivateRepo.git' to 'C:\Users\user\AppData\Local\Temp\tmpbeadgsg7dvc-clone'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dvc\commands\imp.py", line 15, in run
File "dvc\repo\imp.py", line 6, in imp
File "dvc\repo\__init__.py", line 48, in wrapper
File "dvc\repo\scm_context.py", line 152, in run
File "dvc\repo\imp_url.py", line 83, in imp_url
File "funcy\decorators.py", line 45, in wrapper
File "dvc\stage\decorators.py", line 36, in rwlocked
File "funcy\decorators.py", line 66, in __call__
File "dvc\stage\__init__.py", line 533, in run
File "funcy\decorators.py", line 45, in wrapper
File "dvc\stage\decorators.py", line 36, in rwlocked
File "funcy\decorators.py", line 66, in __call__
File "dvc\stage\__init__.py", line 557, in _sync_import
File "dvc\stage\imports.py", line 47, in sync_import
File "dvc\dependency\repo.py", line 66, in download
File "dvc\dependency\repo.py", line 107, in _get_used_and_obj
File "contextlib.py", line 113, in __enter__
File "dvc\external_repo.py", line 36, in external_repo
File "dvc\external_repo.py", line 162, in _cached_clone
File "funcy\decorators.py", line 45, in wrapper
File "funcy\flow.py", line 274, in wrap_with
File "funcy\decorators.py", line 66, in __call__
File "dvc\external_repo.py", line 232, in _clone_default_branch
File "dvc\scm.py", line 108, in clone
dvc.scm.CloneError: Failed to clone repo '[email protected]:YourOrg/YourPrivateRepo.git' to 'C:\Users\user\AppData\Local\Temp\tmpbeadgsg7dvc-clone'
------------------------------------------------------------
2022-04-15 23:40:53,453 DEBUG: Analytics is enabled.
2022-04-15 23:40:53,456 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', 'C:\\Users\\user\\AppData\\Local\\Temp\\tmpnnqcpnag']'
2022-04-15 23:40:53,461 DEBUG: Spawned '['daemon', '-q', 'analytics', 'C:\\Users\\user\\AppData\\Local\\Temp\\tmpnnqcpnag']'
I uninstalled dvc (which was previously installed using the dvc Windows Installer - to get the automated symlink permissions setup). Then I installed dvc using choco. This resolved the issue. It could be that the problem then lies with the dvc Windows Installer, or that I missed a manual installation step.
Output of dvc doctor:
$ dvc doctor
DVC version: 2.10.1 (choco)
---------------------------------
Platform: Python 3.9.0 on Windows-10-10.0.19041-SP0
Supports:
azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.9.0),
gdrive (pydrive2 = 1.10.0),
gs (gcsfs = 2022.3.0),
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
ssh (sshfs = 2022.3.1),
oss (ossfs = 2021.8.0)
Cache types: hardlink
Cache directory: NTFS on D:\
Caches: local
Remotes: local
Workspace directory: NTFS on D:\
Repo: dvc, git
Having the same issue of the missing win32timezone module with Win installer 2.18.0 (both in powershell and git bash). My workaround was installing dvc via pip and a venv.
I think more issues like #2754, #7505 and this problem discussion are connected to this. The merged PR #2763 seems to address it but we're still having the win32timezone issue after all?!
Any idea if this will get attention? Ran into the same issue from the 2.11.0 windows installer (although pip-installing 2.29.0 has failed with different error; had to revert to 2.9.5). It's much more convenient for our team to distribute internally via the binary installer.
Edit: In case anyone else has this issue, the bug is not present in the 2.9.5 windows binary installer.
@HaddocktheHorrible have you tried the windows installer for the latest DVC release?
Hi @pmrowla - not yet. Since I ran into #7702 from the pip version of 2.29.0, I did not bother with the windows installer. But you're right that I can see if I get the win32timezone error on the latest installer too. I will report back...
Update: Confirmed I see the same ModuleNotFoundError: No module named 'win32timezone' error when trying to call dvc import from the 2.30.0 release version.
Here's the traceback leading to that error:
Traceback (most recent call last):
File "scmrepo\git\backend\dulwich\__init__.py", line 200, in clone
File "dulwich\porcelain.py", line 538, in clone
File "dulwich\client.py", line 760, in clone
File "dulwich\client.py", line 837, in fetch
File "dulwich\client.py", line 1146, in fetch_pack
File "dulwich\client.py", line 1792, in _connect
File "fsspec\asyn.py", line 111, in wrapper
File "fsspec\asyn.py", line 96, in sync
File "fsspec\asyn.py", line 53, in _runner
File "scmrepo\git\backend\dulwich\asyncssh_vendor.py", line 163, in _run_command
File "asyncssh\connection.py", line 7834, in connect
File "asyncio\tasks.py", line 442, in wait_for
File "asyncssh\connection.py", line 437, in _connect
File "asyncio\base_events.py", line 1090, in create_connection
File "asyncio\base_events.py", line 1108, in _create_connection_transport
File "asyncssh\connection.py", line 7825, in conn_factory
File "asyncssh\connection.py", line 3097, in __init__
File "asyncssh\gss_win32.py", line 168, in __init__
File "sspi.py", line 200, in __init__
ModuleNotFoundError: No module named 'win32timezone'
And dvc doctor output:
$ dvc doctor
DVC version: 2.30.0 (exe)
---------------------------------
Platform: Python 3.9.13 on Windows-10-10.0.19043-SP0
Subprojects:
Supports:
azure (adlfs = 2022.10.0, knack = 0.10.0, azure-identity = 1.10.0),
gdrive (pydrive2 = 1.10.0),
gs (gcsfs = 2022.1.0),
hdfs (fsspec = 2022.1.0, pyarrow = 7.0.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
oss (ossfs = 2021.8.0),
s3 (s3fs = 2022.1.0, boto3 = 1.20.24),
ssh (sshfs = 2021.11.2),
webdav (webdav4 = 0.9.4),
webdavs (webdav4 = 0.9.4),
webhdfs (fsspec = 2022.1.0)
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: s3, s3
Workspace directory: NTFS on C:\
Repo: dvc, git
Hi @pmrowla - last week I tried with the 2.30.0 but had no success (see logs attached in the previous comment). Sorry if this is a double-ping -- I realized comment edits might not send notifications...
I'm not able to reproduce, but it looks like we were simply missing a hiddenimport for pyinstaller. Added one in https://github.com/iterative/dvc/commit/ab3f8fb1a4b12ae61f0f04f2821bfdf9ddcb6fd4 @W1M0R @HaddocktheHorrible @jruehle folks, please give 2.35.2 a try (the packages should be out later today/tomorrow) https://dvc.org/download/win/dvc-2.35.2
@efiop Sorry for the delay - just had a chance to install and try it. Can confirm dvc import is working again as expected for the 2.35.2 binary install. Thanks for the fix!
I have same problem in version 2.34.0. but after installed 2.35.2, it works with out probolems. thank you