alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

JobMaster occurs memory leak problems when running too many distributedLoad job

Open liiuzq-xiaobai opened this issue 8 months ago • 1 comments

Alluxio Version: 2.9.3

Describe the bug After submitting a large number of distributedLoad jobs in production environment, job master has a memory leak problem and finally cause OOM. 企业微信截图_43d4d7b1-5565-469c-84d6-8e8c199a062a

To Reproduce 1.Set up one alluxio cluster,1 master, 3 workers. 2.Mock a large number of small files in underFileSystem 3.Submit a large number of distributedLoad jobs.Notice:Take the batchsize=1 as the loading args. 4.Observe the memory changes and gc in JobMaster.

Expected behavior The memory size continues to increase until the maximum memory size is reached,finally causing the OOM problem.

Urgency yes

Are you planning to fix it yes

Additional context The cause of this bug is that the residual job information in mInfoMap is not deleted. image image

liiuzq-xiaobai avatar Jun 26 '24 03:06 liiuzq-xiaobai