secretpad icon indicating copy to clipboard operation
secretpad copied to clipboard

小数据量psi场景oom失败:1w交1w,id为string类型

Open matrix-bin opened this issue 1 year ago • 1 comments

Describe the bug A clear and concise description of what the bug is. 小数据量psi场景oom失败:1w交1w,id为string类型 kuscia容器配置 image

oom时 检查job日志:

2024-03-22T17:21:22.056401564+08:00 stderr F The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

Expected behavior A clear and concise description of what you expected to happen. 看到官网psi相关文档,能跑2000w交20亿的数据,但是关联id类型int,如果string,1w交1w至少也可以正常执行。

Actually behavior A clear and concise description of what you actually to happen. 2024年2月发版的pad的tar包安装,p2p模式部署在同一台物理机。

How to Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See errors

Desktop (please complete the following information):

  • OS: [e.g. Centos]
  • Docker: [e.g. Docker version 19.03.8]
  • Version [e.g. secretpad 0.4.0, kuscia 0.6.0, secretflow 0.5.0]
  • Mode [e.g. center, edge, p2p]
  • Build [e.g. mpc, custom]

Additional context Add any other context about the problem here.

matrix-bin avatar Mar 22 '24 10:03 matrix-bin

问题已通过其他平台解决,原因为数据没有表头而导致。

Chrisdehe avatar Apr 02 '24 07:04 Chrisdehe