rethinkdb-python
rethinkdb-python copied to clipboard
Rethinkdb-import impossible to use on windows
Describe the bug
On Windows 10, with Python 3.8, rethinkdb-import simply refuses to function due to obscure python multiprocessing
errors.
To Reproduce Steps to reproduce the behavior:
-
pip install rethinkdb
from python 3.8 (on windows) -
rethinkdb import [options]
Expected behavior Normal operation, the starting of importing data
System info
- OS: Windows 10 (build 19041.329)
- RethinkDB Version:
2.4.0~0buster
(docker container) - RethinkDB Python adapter Version:
2.4.7
Additional context
PS D:\k8smig\docker\mongodb\_local> rethinkdb-import --file .\tumblr.posts.json --table tumblr.posts -c vanguard --force
Traceback (most recent call last):
File "c:\python\3.8\lib\runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\python\3.8\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Python\3.8\Scripts\rethinkdb-import.exe\__main__.py", line 7, in <module>
File "c:\python\3.8\lib\site-packages\rethinkdb\_import.py", line 1716, in main
import_tables(options, sources)
File "c:\python\3.8\lib\site-packages\rethinkdb\_import.py", line 1359, in import_tables
progress_bar.start()
File "c:\python\3.8\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "c:\python\3.8\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "c:\python\3.8\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "c:\python\3.8\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
reduction.dump(process_obj, to_child)
File "c:\python\3.8\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread._local' object
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:\python\3.8\lib\multiprocessing\spawn.py", line 107, in spawn_main
new_handle = reduction.duplicate(pipe_handle,
File "c:\python\3.8\lib\multiprocessing\reduction.py", line 79, in duplicate
return _winapi.DuplicateHandle(
PermissionError: [WinError 5] Access is denied
Can you check with a simple script and multiprocessing if works fine. I guess in windows you have to change the multiprocessing method, maybe to 'spawn', check the documentation module. Your error is related to a way in what you are translating an object NOT PICKEABLE (not serializable) to other process or thread. So, try this:
Initialize the process then create the object and connection, not before
Ping @ShadowJonathan
This isn't the actual python library driver, these are CLI commands, which honestly I expect to work regardless of platform
Yes, I'd wager that manually hacking the boot process of such a system would fix it, but this issue is specific to fixing the source code to work with windows as well, not monkey patch it
Any updates on this? Having the same problem on macOS. RethinkDB v2.4.0
If not, anyone know any other way to import JSON files?
@htbrown to me it is weird that only happens on windows and the stack trace show the error is originated from the builtin multiprocessing lib.
Since I cannot test this on windows, I'd ask some more details from you:
What happens in case you execute the import script manually and not through the rethinkdb wrapper? (By that I mean find the python file and call that, not the "rethinkdb import" command)
To clarify - I was having the same issues with macOS. I haven't tried it on my Windows box.
If you'd like me to, I can later. Give me a while.
to me it is weird that only happens on windows and the stack trace show the error is originated from the builtin multiprocessing lib.
@gabor-boros if you look closely at the top stack trace, this is caused by pickling _thread._local
, after which a pickling engine (for "sending" values across) comes across it, and throws an exception. This is caused by thinks set in place both for that pickling engine, and whatever values are dropped to other threads.
(iirc (but i am not sure on this), when i originally was investigating this, i could remember some kind of multiprocess jank in there that was causing this, and would only fly nicely under linux/unix, but not under windows, i'm not sure of that, as i could be misremembering)
If anyone needs it, I've made my own RethinkDB importer in Node. Got fed up with faffing around with the built in Python one. https://github.com/htbrown/rethinkdb-import
No documentation yet so if you need help submit an issue.
I am also having this issue on a mac, while trying to do an export. Rethinkdb 2.4.0 with Python 3.8.2, database running in a Docker container
Update: So I had this issue while following the official docs using venv
and Python 3. Just now tried it outside of venv, using Python 2, and it worked.
Right so there's even something wrong with exporting. Hmm.
I'm also having the same problem in mac os Python 3.9.0 rethinkdb 2.4.0
I'm getting these errors as well. I'm on Mac Mini M1. I'm not a python expert, so I can't tell if rethinkdb is using python2 or python3, but when I type "python" it seems to use python2. Is that good? Bad? I have both on my system.
giro@geoffs-mac-mini:~/rethinkdb-import$python --version
Python 2.7.16
giro@geoffs-mac-mini:~/rethinkdb-import$python3 --version
Python 3.9.1
giro@geoffs-mac-mini:~/rethinkdb-import$rethinkdb --version
rethinkdb 2.4.1 (CLANG 12.0.0 (clang-1200.0.32.28))
Is there a non-python simple binary I can use for import/export instead?
Or a way to disable multiprocessing?
I'm not a python expert, so I can't tell if rethinkdb is using python2 or python3, but when I type "python" it seems to use python2. Is that good? Bad? I have both on my system.
I suggest removing or avoiding using python 2, it's been deprecated for a while.
I tried to remove Python 2 on my Mac before and seem to remember finding it incredibly difficult for a while and then just giving up because it was more effort than it was worth. Do they still package it all in with the Python 3 installer?
remove only if possible, sometimes python 2 is deeply embedded for system stuff (and people dont care enough to update it to 3), but try to find ways to make python 3 the default for your usages
remove only if possible, sometimes python 2 is deeply embedded for system stuff (and people dont care enough to update it to 3), but try to find ways to make python 3 the default for your usages
Yeah I think portions of macOS use it. I'm just trying to avoid it as much as possible.
On Mac 11.2.1 the dump
command is doing the same thing.
rethinkdb dump -c my-host-name
cannot pickle '_thread._local' object
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/rethinkdb/_dump.py", line 200, in main
_export.run(options)
File "/usr/local/lib/python3.9/site-packages/rethinkdb/_export.py", line 641, in run
run_clients(options, working_dir, db_table_set)
File "/usr/local/lib/python3.9/site-packages/rethinkdb/_export.py", line 526, in run_clients
new_process.start()
File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread._local' object
Error: export failed, cannot pickle '_thread._local' object
I have created this tool as a workaround (using NodeJS): https://github.com/GeoffreyPlitt/rethinkdb-import
If it's any help, it looks like there was an issue with redis-py. It looks like the parent process might not be serializable. Another similar issue was fixed by serving up a different connection.
If anyone needs it, I've made my own RethinkDB importer in Node. Got fed up with faffing around with the built in Python one. https://github.com/htbrown/rethinkdb-import
No documentation yet so if you need help submit an issue.
You saved my day. Thanks
PermissionError: [WinError 5] Access is denied
@ShadowJonathan Did you closed Kaspersky Antivitus when you started your python script? Not suppressed for n minutes, but completely closed it. It helped in my case... a bit...
I didn't have any antivirus on when I tried this script, but that's not the point.
any update on this? Same issue on dump (mac, rethinkdb 2.4, python3.9)
ok.. so downgrading to Python 3.7.9 works. If anyone still needs it!
@lkovesdi Thanks for posting this. I wasted some time on this problem, then found this suggestion, and indeed downgrading my python (via homebrew) let me work around the issue. At least temporarily, because there is no escaping newer Python versions.