containers icon indicating copy to clipboard operation
containers copied to clipboard

Remove short circuiting for various IntMap functions causes some allocations regressions.

Open mpickering opened this issue 2 years ago • 2 comments

Commit e6fbb98a71c5a72e0805285cbbb799c5f978aceb causes GHCs performance tests to fail. (https://gitlab.haskell.org/ghc/ghc/-/jobs/958207)

           InstanceMatching(normal) ghc/alloc  5,089,763,632  5,091,492,392  +0.0%     
          InstanceMatching1(normal) ghc/alloc 27,713,862,216 27,731,961,608  +0.1%     
                LargeRecord(normal) ghc/alloc  6,041,807,976  6,167,383,288  +2.1%  BAD
           ManyAlternatives(normal) ghc/alloc    731,728,384    732,205,472  +0.1%     
           ManyConstructors(normal) ghc/alloc  3,906,708,768  3,907,866,576  +0.0%     
      MultiComponentModules(normal) ghc/alloc  1,792,077,072  1,791,894,352  -0.0%     
MultiComponentModulesRecomp(normal) ghc/alloc    739,981,312    739,973,248  -0.0%     
          MultiLayerModules(normal) ghc/alloc  3,108,214,752  3,107,719,088  -0.0%     
    MultiLayerModulesRecomp(normal) ghc/alloc    854,776,280    854,763,816  -0.0%     
   MultiLayerModulesTH_Make(normal) ghc/alloc    628,756,288    631,253,840  +0.4%     
MultiLayerModulesTH_OneShot(normal) ghc/alloc  2,708,437,104  2,708,683,336  +0.0%     
                  PmSeriesG(normal) ghc/alloc     43,846,344     44,058,552  +0.5%     
                  PmSeriesS(normal) ghc/alloc     53,097,680     53,147,336  +0.1%     
                  PmSeriesT(normal) ghc/alloc     76,295,096     76,708,768  +0.5%     
                  PmSeriesV(normal) ghc/alloc     52,503,896     52,625,760  +0.2%     
                     T10421(normal) ghc/alloc    110,969,584    111,078,784  +0.1%     
                    T10421a(normal) ghc/alloc     77,539,344     77,582,480  +0.1%     
                     T10547(normal) ghc/alloc     27,836,328     27,878,480  +0.2%     
                     T10858(normal) ghc/alloc    126,696,896    126,758,272  +0.0%     
                     T11195(normal) ghc/alloc    234,317,584    234,768,200  +0.2%     
                     T11276(normal) ghc/alloc     93,173,784     93,327,368  +0.2%     
                    T11303b(normal) ghc/alloc     40,094,728     40,074,024  -0.1%     
                     T11374(normal) ghc/alloc    191,132,936    191,134,640  +0.0%     
                     T11545(normal) ghc/alloc    939,163,880    939,193,288  +0.0%     
                     T11822(normal) ghc/alloc    107,042,504    107,212,856  +0.2%     
                     T12150(optasm) ghc/alloc     77,408,880     77,454,568  +0.1%     
                     T12227(normal) ghc/alloc    473,925,064    480,160,096  +1.3%  BAD
                     T12234(optasm) ghc/alloc     55,823,640     55,835,792  +0.0%     
                     T12425(optasm) ghc/alloc     87,115,072     87,270,448  +0.2%     
                     T12545(normal) ghc/alloc  1,667,437,672  1,720,649,048  +3.2%     
                     T12707(normal) ghc/alloc    943,607,728    945,187,400  +0.2%     
                     T13035(normal) ghc/alloc    101,286,568    101,366,424  +0.1%     
                     T13056(optasm) ghc/alloc    362,261,640    363,079,832  +0.2%     
                     T13253(normal) ghc/alloc    338,486,424    339,347,752  +0.3%     
                 T13253-spj(normal) ghc/alloc    123,347,384    123,449,552  +0.1%     
                     T13379(normal) ghc/alloc    352,877,984    352,973,008  +0.0%     
                     T13386(normal) ghc/alloc    867,199,000    880,838,576  +1.6%  BAD
                     T13701(normal) ghc/alloc  2,392,939,488  2,392,479,344  -0.0%     
                     T13719(normal) ghc/alloc  4,971,686,840  4,933,841,200  -0.8%     
                       T14052(ghci) ghc/alloc  3,613,484,344  3,616,095,496  +0.1%     
                   T14052Type(ghci) ghc/alloc  6,323,987,176  6,324,274,824  +0.0%     
                     T14683(normal) ghc/alloc  2,986,938,912  3,008,927,472  +0.7%     
                     T14697(normal) ghc/alloc    342,160,192    342,064,080  -0.0%     
                     T15164(normal) ghc/alloc  1,285,202,776  1,288,030,896  +0.2%     
                     T15304(normal) ghc/alloc  1,246,949,792  1,250,268,336  +0.3%     
                     T15630(normal) ghc/alloc    157,893,456    157,940,168  +0.0%     
                     T15703(normal) ghc/alloc    514,823,488    522,608,456  +1.5%  BAD
                     T16577(normal) ghc/alloc  7,567,314,120  7,649,470,920  +1.1%     
                     T16875(normal) ghc/alloc     34,586,464     34,578,160  -0.0%     
                     T17096(normal) ghc/alloc    217,397,568    217,925,448  +0.2%     
                     T17516(normal) ghc/alloc  1,717,467,456  1,721,208,832  +0.2%     
                     T17836(normal) ghc/alloc    837,839,944    839,389,344  +0.2%     
                    T17836b(normal) ghc/alloc     45,019,864     45,008,160  -0.0%     
                     T17977(normal) ghc/alloc     39,827,664     39,800,408  -0.1%     
                    T17977b(normal) ghc/alloc     37,605,824     37,571,680  -0.1%     
                     T18140(normal) ghc/alloc     75,405,160     75,410,736  +0.0%     
                     T18223(normal) ghc/alloc  1,117,577,272  1,148,818,384  +2.8%  BAD
                     T18282(normal) ghc/alloc    148,271,824    148,390,728  +0.1%     
                     T18304(normal) ghc/alloc     90,673,016     90,676,992  +0.0%     
                     T18478(normal) ghc/alloc    507,089,832    508,140,656  +0.2%     
                    T18698a(normal) ghc/alloc    199,865,624    200,035,808  +0.1%     
                    T18698b(normal) ghc/alloc    221,130,064    221,332,784  +0.1%     
                     T18923(normal) ghc/alloc     70,802,584     70,873,328  +0.1%     
                      T1969(normal) ghc/alloc    749,857,072    750,288,264  +0.1%     
                     T19695(normal) ghc/alloc  1,422,947,528  1,426,109,104  +0.2%     
                     T20049(normal) ghc/alloc     89,814,808     89,759,520  -0.1%     
                     T20261(normal) ghc/alloc    616,696,800    621,478,856  +0.8%     
                      T3064(normal) ghc/alloc    202,526,136    205,106,112  +1.3%     
                      T3294(normal) ghc/alloc  1,623,091,088  1,622,053,560  -0.1%     
                      T4801(normal) ghc/alloc    302,406,944    303,090,288  +0.2%     
                      T5030(normal) ghc/alloc    351,294,664    359,386,352  +2.3%  BAD
                    T5321FD(normal) ghc/alloc    249,192,352    249,317,824  +0.1%     
                   T5321Fun(normal) ghc/alloc    276,394,384    276,466,184  +0.0%     
                      T5631(normal) ghc/alloc    565,954,304    577,675,064  +2.1%  BAD
                      T5642(normal) ghc/alloc    462,959,792    468,391,080  +1.2%     
                      T5837(normal) ghc/alloc     35,721,056     35,682,016  -0.1%     
                      T6048(optasm) ghc/alloc    100,822,496    100,887,408  +0.1%     
                       T783(normal) ghc/alloc    380,353,976    380,735,504  +0.1%     
                      T8095(normal) ghc/alloc  3,124,233,344  3,243,822,096  +3.8%  BAD
                      T9020(optasm) ghc/alloc    242,711,104    245,175,384  +1.0%     
                      T9198(normal) ghc/alloc     46,243,688     46,223,936  -0.0%     
                      T9233(normal) ghc/alloc    715,117,200    714,845,768  -0.0%     
                      T9630(normal) ghc/alloc  1,510,533,960  1,520,912,432  +0.7%     
                      T9675(optasm) ghc/alloc    437,013,336    436,703,880  -0.1%     
                     T9872a(normal) ghc/alloc  1,718,750,472  1,685,475,544  -1.9% GOOD
                     T9872b(normal) ghc/alloc  1,963,440,192  1,945,595,120  -0.9%     
               T9872b_defer(normal) ghc/alloc  2,996,925,768  2,981,205,512  -0.5%     
                     T9872c(normal) ghc/alloc  1,681,215,352  1,663,474,808  -1.1% GOOD
                     T9872d(normal) ghc/alloc    396,884,424    397,837,272  +0.2%     
                      T9961(normal) ghc/alloc    351,621,368    352,289,600  +0.2%     
       TcPlugin_RewritePerf(normal) ghc/alloc  2,168,137,720  2,150,244,128  -0.8%     
                      WWRec(normal) ghc/alloc    588,926,384    589,396,496  +0.1%     
             hard_hole_fits(normal) ghc/alloc    480,547,232    480,925,096  +0.1%     
                     hie002(normal) ghc/alloc  9,747,820,752  9,380,086,064  -3.8%     
                 parsing001(normal) ghc/alloc    452,303,240    452,305,024  +0.0%   

The issue for us in Data.IntMap.Internal.lookup, the modification of this function makes it small enough that Worker/Wrapper doesn't trigger on it:

https://gist.github.com/mpickering/74590e7ababb7a82abc4b04749456ccd

The consequence is that different things get inlined into lookupVarEnv (a function in GHC)

https://gist.github.com/mpickering/cd5a3c7baf1ae678d1a830cf76b2c56e

Then lookupVarEnv doesn't get inlined (because it is now too big) and every time it's called a function is allocated.

OTOH, if the function is like how it was before then the wrapper gets inlined into lookupVarEnv, lookupVarEnv is the inlined and the worker subsequently inlined into the right place.

mpickering avatar Mar 03 '22 16:03 mpickering

Great analysis - but my conclusion is not

Commit https://github.com/haskell/containers/commit/e6fbb98a71c5a72e0805285cbbb799c5f978aceb causes GHCs performance tests to fail.

but this instead:

GHC's (quite unpredictable) inlining mechanics causes GHCs performance tests to fail.

It's not clear to me that rolling back the commit here would bring a stable improvement for GHC. The discussion https://github.com/haskell/containers/issues/794 https://github.com/haskell/containers/pull/800 seemed to show that there were some good reasons, backed up with measurements.

@nomeata perhaps saw something coming: https://github.com/haskell/containers/pull/800#issuecomment-922498670 Now would "offer both" help?

jwaldmann avatar Mar 11 '22 14:03 jwaldmann

Agreed; this commit can't be blamed for subtle inlining interactions, unless it does something wrong with inlining.

treeowl avatar Mar 11 '22 14:03 treeowl