ImplicitCAD
ImplicitCAD copied to clipboard
Slow performance
Rendering nontrivial meshes takes forever at small resolutions.
Currently, almost all of our time is being spent in getImplicitShared
. We can shave off a lot of that by specializing it:
{-# SPECIALIZE getImplicitShared :: SharedObj SymbolicObj2 ℝ2 -> ℝ2 -> ℝ #-}
{-# SPECIALIZE getImplicitShared :: SharedObj SymbolicObj3 ℝ3 -> ℝ3 -> ℝ #-}
Enabling StrictData
and -funbox-strict-fields
reduces the allocation by ~7x.
After both of the optimizations above, we're still spending a huge chunk of time inside getImplicitShared
:
Sat Dec 19 18:22 2020 Time and Allocation Profiling Report (Final)
cad-exe +RTS -N -p -RTS
total time = 72.96 secs (270323 ticks @ 1000 us, 8 processors)
total alloc = 119,409,782,968 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
getImplicitShared Graphics.Implicit.ObjectUtil.GetImplicitShared Graphics/Implicit/ObjectUtil/GetImplicitShared.hs:(52,1)-(93,77) 20.5 13.5
shared_translate Graphics.Implicit.ObjectUtil.GetImplicitShared Graphics/Implicit/ObjectUtil/GetImplicitShared.hs:82:43-91 10.4 6.9
fmap Linear.V3 src/Linear/V3.hs:128:3-42 8.8 15.1
getImplicit3 Graphics.Implicit.ObjectUtil.GetImplicit3 Graphics/Implicit/ObjectUtil/GetImplicit3.hs:(30,1)-(156,69) 7.3 6.5
shared_union Graphics.Implicit.ObjectUtil.GetImplicitShared Graphics/Implicit/ObjectUtil/GetImplicitShared.hs:60:39-66 6.6 10.0
- Linear.V3 src/Linear/V3.hs:200:3-18 6.0 5.5
dot Linear.V3 src/Linear/V3.hs:270:3-51 5.0 6.0
reflect Graphics.Implicit.MathUtil Graphics/Implicit/MathUtil.hs:159:1-56 4.6 4.9
shared_translate_subtract Graphics.Implicit.ObjectUtil.GetImplicitShared Graphics/Implicit/ObjectUtil/GetImplicitShared.hs:82:86-90 3.8 0.0
<*> Linear.V3 src/Linear/V3.hs:170:3-46 3.6 4.2
getImplicit2 Graphics.Implicit.ObjectUtil.GetImplicit2 Graphics/Implicit/ObjectUtil/GetImplicit2.hs:(41,1)-(69,50) 2.1 3.2
shared_mirror Graphics.Implicit.ObjectUtil.GetImplicitShared Graphics/Implicit/ObjectUtil/GetImplicitShared.hs:86:31-61 2.0 1.8
* Linear.Quaternion src/Linear/Quaternion.hs:(275,3)-(276,73) 2.0 3.0
shared_intersection Graphics.Implicit.ObjectUtil.GetImplicitShared Graphics/Implicit/ObjectUtil/GetImplicitShared.hs:64:36-79 1.4 1.0
rmaximum Graphics.Implicit.MathUtil Graphics/Implicit/MathUtil.hs:(79,1)-(90,38) 1.4 0.6
bf Graphics.Implicit.Export.TextBuilderUtils Graphics/Implicit/Export/TextBuilderUtils.hs:38:1-64 1.1 3.5
fmap Linear.V2 src/Linear/V2.hs:156:3-34 1.1 2.3
rminimum Graphics.Implicit.MathUtil Graphics/Implicit/MathUtil.hs:(97,1)-(108,25) 1.1 0.0
+ Linear.V3 src/Linear/V3.hs:198:3-18 1.0 2.8
- Linear.V2 src/Linear/V2.hs:226:3-18 0.9 1.4
getMesh Graphics.Implicit.Export.Render Graphics/Implicit/Export/Render.hs:(77,1)-(188,50) 0.8 1.8
I've added cost centers to each case in getImplicitShared
, which is what the shared_union
et al. CCs are.
profile:
flame graph:
I think the reason for the massive costs of translate and union in this case is twofold:
- we're calling
getImplicit
a lot - my model has lots of constructions of the form
translate _ $ union [translate _ $ union [...], translate $ union [ ...] ]
I think in general we want to add a simplification pass that pushes translates through union/intersection/difference/shell/outset/complement
. Sounds crazy, I admit, but this gives us drastically more opportunities to simplify translate . translate
.