FATE icon indicating copy to clipboard operation
FATE copied to clipboard

10亿规模数据集隐私集合求交报错

Open jsuper opened this issue 2 years ago • 2 comments

Describe the bug 使用FATE-TEST生成双方10规模数据集,交集50%。

硬件配置 两方各三台服务器,单台硬件配置:cpu 40c, 内存:256g,磁盘1.8T SSD 单方部署情况:rollsite nodemanager clustermanager部署在同一台服务器上,另外两台为nodemanager

To Reproduce 1、使用DH和RSA算法,均可能复现 2、配置task_cores: 96, compute_partition:96

错误日志如下

==== detail start, at 20220726.203003.988 ==== Traceback (most recent call last): File "/home/fate/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, **kw) File "/home/fate/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task value=self.functor_serdes.serialize(f(task))) File "/home/fate/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize return cloudpickle.dumps(_obj) File "/home/fate/data/projects/fate/common/python/venv/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps cp.dump(obj) File "/home/fate/data/projects/fate/common/python/venv/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump return Pickler.dump(self, obj) File "/home/fate/data/projects/fate/common/miniconda3/lib/python3.6/pickle.py", line 409, in dump self.save(obj) File "/home/fate/data/projects/fate/common/miniconda3/lib/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/home/fate/data/projects/fate/common/miniconda3/lib/python3.6/pickle.py", line 606, in save_reduce save(args) File "/home/fate/data/projects/fate/common/miniconda3/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/home/fate/data/projects/fate/common/miniconda3/lib/python3.6/pickle.py", line 751, in save_tuple save(element) File "/home/fate/data/projects/fate/common/miniconda3/lib/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/home/fate/data/projects/fate/common/miniconda3/lib/python3.6/pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "/home/fate/data/projects/fate/common/miniconda3/lib/python3.6/pickle.py", line 841, in _batch_setitems tmp = list(islice(it, self._BATCHSIZE)) RuntimeError: dictionary changed size during iteration

jsuper avatar Jul 26 '22 12:07 jsuper

能看到dh或者rsa 哪快调用触发这个错误吗?

dylan-fan avatar Jul 27 '22 03:07 dylan-fan

能看到dh或者rsa 哪快调用触发这个错误吗?

目前看了eggroll的日志,没找到其他的异常日志 @dylan-fan

jsuper avatar Jul 29 '22 07:07 jsuper