SynapseML icon indicating copy to clipboard operation
SynapseML copied to clipboard

setting useSingleDatasetMode to True gives : java: malloc.c:4033: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.

Open Nitinsiwach opened this issue 2 years ago • 2 comments

setting useSingleDatasetMode to True gives

java: malloc.c:4033: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.

I keep everything else the same and set that flag to False and the code runs just fine. At the surface the error looks like some memory overflow but, the single dataset mode was supposed to reduce the memory burden among other things.

Could you please help me understand what is going wrong?

I am using 0.9.4 synapse. My spark version is 3.1.2

A few more details that might help you:

cluster: 1 master, 2 executors - 64 GB, 16 cores each training data on disk - 1.37 GB

spark-submi config:

spark.executor.memory=10g spark.executor.instances=10 spark.executor.cores=3 spark.driver.memory=10g spark.default.parallelism=54 spark.driver.cores=3 spark.driver.memoryOverhead=1024m spark.executor.memoryOverhead=1024m spark.dynamicAllocation.enabled=false

model parameters:

{
"numIterations":2500,
"learningRate":0.01,
"maxDepth":30,
"earlyStoppingRound":50,
"chunkSize":800000,
"parallelism":"voting_parallel",
"useSingleDatasetMode":true,
"numThreads":14
}

Nitinsiwach avatar Feb 25 '22 10:02 Nitinsiwach

@imatiach-msft for visibility

mhamilton723 avatar Feb 28 '22 18:02 mhamilton723

@Nitinsiwach I wonder if you should instead try setting:

spark.executor.instances=32 (2*16 cores?) spark.executor.cores=1 spark.executor.memory=4 (64 GB per machine/16 executors per machine?)

You can also try the opposite extreme:

spark.executor.instances=2 spark.executor.cores=16 spark.executor.memory=64g

I'm not sure about the second case, but the first one looks more similar to databricks I believe, which I've tested on the most.

imatiach-msft avatar Mar 01 '22 05:03 imatiach-msft