fastxml icon indicating copy to clipboard operation
fastxml copied to clipboard

Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed send(obj) IOError: bad message length

Open mlukasik opened this issue 7 years ago • 12 comments

Hi!

When I run training on 10M examples (each described by a small subset of 100K features), it breaks with the error:

.... Splitting 2033 Training classifier Splitting 1201 Training classifier Splitting 1323 Training classifier Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed send(obj) IOError: bad message length

Do you know what is the reason and how it could be fixed?

I tried smaller datasets (100K, 1M examples) and the training worked for them.

Cheers, Michal

mlukasik avatar Jan 25 '18 09:01 mlukasik

Curious. I wonder if we're running into the max size of pickle for serialization. Does the problem appear if you increase the max leaf size to say 100?

Can you also check to see if you received an out of memory error around that time?

On Jan 25, 2018 1:41 AM, "Michal Lukasik" [email protected] wrote:

Hi!

When I run training on 10M examples (each described by a small subset of 100K features), it breaks with the error:

.... Splitting 2033 Training classifier Splitting 1201 Training classifier Splitting 1323 Training classifier Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed send(obj) IOError: bad message length

Do you know what is the reason and how it could be fixed?

I tried smaller datasets (100K, 1M examples) and the training worked for them.

Cheers, Michal

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Refefer/fastxml/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdKFajP8MgToRsYjtyZbFhYJp4iVfVZks5tOExegaJpZM4Rskp6 .

Refefer avatar Jan 25 '18 16:01 Refefer

Any updates?

Refefer avatar Jan 27 '18 20:01 Refefer

Thanks for the reply! I am rerunning with 100 max_leaf_size to see if it will pass, however I think it might hurt classification accuracy. I didn't see any out of memory error around that time.

mlukasik avatar Jan 28 '18 20:01 mlukasik

It certainly possible it will; this is intended to test whether the tree is too large to serialize correctly. How many labels are you predicting?

Refefer avatar Jan 28 '18 20:01 Refefer

I got 100K labels (and 100K features).

mlukasik avatar Jan 28 '18 20:01 mlukasik

Any updates?

Refefer avatar Feb 03 '18 19:02 Refefer

Thanks for following up. I am trying to run the training with --max_leaf_size 100 and --threads 5, but it seems to be training forever...

mlukasik avatar Feb 03 '18 20:02 mlukasik

--threads 5 is going to hurt if you're using the default set of trees, which is 50. You might ramp that down to 5 trees for debugging purposes for the time being.

Refefer avatar Feb 03 '18 20:02 Refefer

sounds good, i'll do that!

mlukasik avatar Feb 03 '18 20:02 mlukasik

When running with 5 threads and 5 trees, I got this error message:

9790000 docs encoded 9800000 docs encoded Traceback (most recent call last): File "/usr/local/bin/fxml.py", line 4, in import('pkg_resources').run_script('fastxml==2.0.0', 'fxml.py') File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 750, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 1534, in run_script exec(script_code, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/fastxml-2.0.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/fxml.py", line 646, in

File "/usr/local/lib/python2.7/dist-packages/fastxml-2.0.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/fxml.py", line 453, in train

File "build/bdist.linux-x86_64/egg/fastxml/trainer.py", line 468, in fit File "build/bdist.linux-x86_64/egg/fastxml/trainer.py", line 410, in _build_roots File "build/bdist.linux-x86_64/egg/fastxml/proc.py", line 50, in f2 File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib/python2.7/multiprocessing/forking.py", line 121, in init self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

mlukasik avatar Feb 04 '18 14:02 mlukasik

There we have it. How much memory does the machine have?

You'll want to try increasing the regularization coefficient to increase sparsity of the linear classifiers. You can also use the --subset flag to send only a subset of the data to each tree (ala random forests).

Refefer avatar Feb 05 '18 19:02 Refefer

My machine has actually quite a lot of memory: mlukasik@mlukasik:~/workspace/fastxml_py$ cat /proc/meminfo MemTotal: 65865896 kB

Is it because we try to load all data at once?

mlukasik avatar Feb 14 '18 19:02 mlukasik