meta-dataset icon indicating copy to clipboard operation
meta-dataset copied to clipboard

Meta-Dataset in TFDS: [F tensorflow/core/platform/default/env.cc:73] Check failed: ret == 0 (11 vs. 0)Thread tf_data_iterator_resource creation via pthread_create() failed.

Open jfb54 opened this issue 3 years ago • 4 comments

When Training on Meta-Dataset episodes (with all the training datasets) using the TFDS APIs, after only a few tasks the reader fails with the following error: [F tensorflow/core/platform/default/env.cc:73] Check failed: ret == 0 (11 vs. 0)Thread tf_data_iterator_resource creation via pthread_create() failed. This is on Linux with the latest TensorFlow and the latest TensorFlow Datasets frameworks installed.

Is there some limit that needs to be increased to accommodate all the thread usage?

jfb54 avatar Feb 13 '22 19:02 jfb54

The TFDS implementation unfortunately creates lots of threads due to there being one dataset per class. I'm not sure what the best solution would be, but I'll look into it and report back.

vdumoulin avatar Mar 09 '22 15:03 vdumoulin

https://github.com/tensorflow/tensorflow/issues/41532#issuecomment-759075803 suggests that TF may use more than the numbers of available threads, and suggests things to check. You could try using ulimit -u, as explained here (in another context) to expand that limit if it's the issue. If that doesn't work, could you share the limits you see?

Unfortunately I'm not aware of a way to ask TF to be more frugal.

lamblin avatar Mar 24 '22 17:03 lamblin

For me, ulimit -u gives 'unlimited'. I also checked /etc/security/limits.conf (all commented-out) and this:

cat /proc/sys/kernel/threads-max
3976018

Still getting same error as reported above.

lehrig avatar Jun 10 '22 19:06 lehrig

@lehrig Hi, I met the same problem in another case. Have you got any solution to it? Thanks so much if you could share your solution.

kiriharulxh avatar May 14 '23 04:05 kiriharulxh