vaex
vaex copied to clipboard
Vaex not exporting to file properly inside of a mulitprocessing pool.
I am having trouble working with vaex
inside python's mulitprocessing
's pool
. The expected behavior for pool.map()
is to iterate through out the list supplied to it but that does not seem to be the case when working with vaex
's dataFrame type objects. Here the code works but only for the first 16 items, 16 being the number of cores I have on my machine.
So, for code setup as follows:
def export_task(item): # item is a tuple
subject, outputPathChunk = item # subject is the vaex dataframe and outputPathChunk is the the path
subject.export_hdf5(outputPathChunk)
And then
import multiprocessing
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
pool.map(export_task,subs)
pool.close()
Where subs
is a 600 items list of tuples and each tuple has two items, the first item as is a vaex
table and the second is a path.
There is a vaex
related warning for the first 16 executions for export_task
and I am wondering if that is choking pool.map
. That would be a simple issue to work around but doing a simple sample_table.export_hdf5(sample_path)
sanity check does not produce the same warning.
The error from vaex
is vaex/dataframe.py:2756: UserWarning: The state wants to rename newMass to __newMass, but __newMass was not found, ignoring the rename
vaex-core 4.14.0 py37hca0595d_0 conda-forge
vaex-hdf5 0.14.1 pyhd8ed1ab_0 conda-forge
Vaex was installed via: pip / conda-forge / from source mamba-forge
OS: Amazon Linux