rethinkdb-python icon indicating copy to clipboard operation
rethinkdb-python copied to clipboard

Rethinkdb-import impossible to use on windows

Open ShadowJonathan opened this issue 4 years ago • 26 comments

Describe the bug On Windows 10, with Python 3.8, rethinkdb-import simply refuses to function due to obscure python multiprocessing errors.

To Reproduce Steps to reproduce the behavior:

  1. pip install rethinkdb from python 3.8 (on windows)
  2. rethinkdb import [options]

Expected behavior Normal operation, the starting of importing data

System info

  • OS: Windows 10 (build 19041.329)
  • RethinkDB Version: 2.4.0~0buster (docker container)
  • RethinkDB Python adapter Version: 2.4.7

Additional context

PS D:\k8smig\docker\mongodb\_local> rethinkdb-import --file .\tumblr.posts.json --table tumblr.posts -c vanguard --force
Traceback (most recent call last):
  File "c:\python\3.8\lib\runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\python\3.8\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Python\3.8\Scripts\rethinkdb-import.exe\__main__.py", line 7, in <module>
  File "c:\python\3.8\lib\site-packages\rethinkdb\_import.py", line 1716, in main
    import_tables(options, sources)
  File "c:\python\3.8\lib\site-packages\rethinkdb\_import.py", line 1359, in import_tables
    progress_bar.start()
  File "c:\python\3.8\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "c:\python\3.8\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "c:\python\3.8\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "c:\python\3.8\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "c:\python\3.8\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread._local' object
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\python\3.8\lib\multiprocessing\spawn.py", line 107, in spawn_main
    new_handle = reduction.duplicate(pipe_handle,
  File "c:\python\3.8\lib\multiprocessing\reduction.py", line 79, in duplicate
    return _winapi.DuplicateHandle(
PermissionError: [WinError 5] Access is denied

ShadowJonathan avatar Jun 29 '20 21:06 ShadowJonathan

Can you check with a simple script and multiprocessing if works fine. I guess in windows you have to change the multiprocessing method, maybe to 'spawn', check the documentation module. Your error is related to a way in what you are translating an object NOT PICKEABLE (not serializable) to other process or thread. So, try this:

Initialize the process then create the object and connection, not before

dpineiden avatar Jul 26 '20 18:07 dpineiden

Ping @ShadowJonathan

gabor-boros avatar Aug 11 '20 09:08 gabor-boros

This isn't the actual python library driver, these are CLI commands, which honestly I expect to work regardless of platform

Yes, I'd wager that manually hacking the boot process of such a system would fix it, but this issue is specific to fixing the source code to work with windows as well, not monkey patch it

ShadowJonathan avatar Aug 11 '20 10:08 ShadowJonathan

Any updates on this? Having the same problem on macOS. RethinkDB v2.4.0

If not, anyone know any other way to import JSON files?

htbrown avatar Oct 09 '20 14:10 htbrown

@htbrown to me it is weird that only happens on windows and the stack trace show the error is originated from the builtin multiprocessing lib.

Since I cannot test this on windows, I'd ask some more details from you:

What happens in case you execute the import script manually and not through the rethinkdb wrapper? (By that I mean find the python file and call that, not the "rethinkdb import" command)

gabor-boros avatar Oct 13 '20 11:10 gabor-boros

To clarify - I was having the same issues with macOS. I haven't tried it on my Windows box.

If you'd like me to, I can later. Give me a while.

htbrown avatar Oct 13 '20 16:10 htbrown

to me it is weird that only happens on windows and the stack trace show the error is originated from the builtin multiprocessing lib.

@gabor-boros if you look closely at the top stack trace, this is caused by pickling _thread._local, after which a pickling engine (for "sending" values across) comes across it, and throws an exception. This is caused by thinks set in place both for that pickling engine, and whatever values are dropped to other threads.

(iirc (but i am not sure on this), when i originally was investigating this, i could remember some kind of multiprocess jank in there that was causing this, and would only fly nicely under linux/unix, but not under windows, i'm not sure of that, as i could be misremembering)

ShadowJonathan avatar Oct 13 '20 17:10 ShadowJonathan

If anyone needs it, I've made my own RethinkDB importer in Node. Got fed up with faffing around with the built in Python one. https://github.com/htbrown/rethinkdb-import

No documentation yet so if you need help submit an issue.

htbrown avatar Oct 22 '20 20:10 htbrown

I am also having this issue on a mac, while trying to do an export. Rethinkdb 2.4.0 with Python 3.8.2, database running in a Docker container

Update: So I had this issue while following the official docs using venv and Python 3. Just now tried it outside of venv, using Python 2, and it worked.

qualitymanifest avatar Nov 05 '20 23:11 qualitymanifest

Right so there's even something wrong with exporting. Hmm.

htbrown avatar Nov 06 '20 16:11 htbrown

I'm also having the same problem in mac os Python 3.9.0 rethinkdb 2.4.0

daprieto1 avatar Jan 02 '21 18:01 daprieto1

I'm getting these errors as well. I'm on Mac Mini M1. I'm not a python expert, so I can't tell if rethinkdb is using python2 or python3, but when I type "python" it seems to use python2. Is that good? Bad? I have both on my system.

giro@geoffs-mac-mini:~/rethinkdb-import$python --version
Python 2.7.16
giro@geoffs-mac-mini:~/rethinkdb-import$python3 --version
Python 3.9.1
giro@geoffs-mac-mini:~/rethinkdb-import$rethinkdb --version
rethinkdb 2.4.1 (CLANG 12.0.0 (clang-1200.0.32.28))

Is there a non-python simple binary I can use for import/export instead?

GeoffreyPlitt avatar Feb 08 '21 02:02 GeoffreyPlitt

Or a way to disable multiprocessing?

GeoffreyPlitt avatar Feb 08 '21 02:02 GeoffreyPlitt

I'm not a python expert, so I can't tell if rethinkdb is using python2 or python3, but when I type "python" it seems to use python2. Is that good? Bad? I have both on my system.

I suggest removing or avoiding using python 2, it's been deprecated for a while.

ShadowJonathan avatar Feb 08 '21 08:02 ShadowJonathan

I tried to remove Python 2 on my Mac before and seem to remember finding it incredibly difficult for a while and then just giving up because it was more effort than it was worth. Do they still package it all in with the Python 3 installer?

htbrown avatar Feb 08 '21 08:02 htbrown

remove only if possible, sometimes python 2 is deeply embedded for system stuff (and people dont care enough to update it to 3), but try to find ways to make python 3 the default for your usages

ShadowJonathan avatar Feb 08 '21 10:02 ShadowJonathan

remove only if possible, sometimes python 2 is deeply embedded for system stuff (and people dont care enough to update it to 3), but try to find ways to make python 3 the default for your usages

Yeah I think portions of macOS use it. I'm just trying to avoid it as much as possible.

htbrown avatar Feb 08 '21 22:02 htbrown

On Mac 11.2.1 the dump command is doing the same thing.

rethinkdb dump -c my-host-name

cannot pickle '_thread._local' object
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/rethinkdb/_dump.py", line 200, in main
    _export.run(options)
  File "/usr/local/lib/python3.9/site-packages/rethinkdb/_export.py", line 641, in run
    run_clients(options, working_dir, db_table_set)
  File "/usr/local/lib/python3.9/site-packages/rethinkdb/_export.py", line 526, in run_clients
    new_process.start()
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread._local' object
Error: export failed, cannot pickle '_thread._local' object

jtwebb avatar Feb 13 '21 00:02 jtwebb

I have created this tool as a workaround (using NodeJS): https://github.com/GeoffreyPlitt/rethinkdb-import

GeoffreyPlitt avatar Feb 13 '21 00:02 GeoffreyPlitt

If it's any help, it looks like there was an issue with redis-py. It looks like the parent process might not be serializable. Another similar issue was fixed by serving up a different connection.

jtwebb avatar Feb 13 '21 14:02 jtwebb

If anyone needs it, I've made my own RethinkDB importer in Node. Got fed up with faffing around with the built in Python one. https://github.com/htbrown/rethinkdb-import

No documentation yet so if you need help submit an issue.

You saved my day. Thanks

Dav2015 avatar May 14 '21 15:05 Dav2015

PermissionError: [WinError 5] Access is denied

@ShadowJonathan Did you closed Kaspersky Antivitus when you started your python script? Not suppressed for n minutes, but completely closed it. It helped in my case... a bit...

red-scorp avatar Jun 28 '21 15:06 red-scorp

I didn't have any antivirus on when I tried this script, but that's not the point.

ShadowJonathan avatar Jun 28 '21 16:06 ShadowJonathan

any update on this? Same issue on dump (mac, rethinkdb 2.4, python3.9)

lkovesdi avatar Oct 21 '21 18:10 lkovesdi

ok.. so downgrading to Python 3.7.9 works. If anyone still needs it!

lkovesdi avatar Oct 21 '21 19:10 lkovesdi

@lkovesdi Thanks for posting this. I wasted some time on this problem, then found this suggestion, and indeed downgrading my python (via homebrew) let me work around the issue. At least temporarily, because there is no escaping newer Python versions.

jwr avatar Dec 15 '21 09:12 jwr