silver
silver copied to clipboard
Silver (compiler) performance meta-bug
There's a bunch of different categories of things that could use improvement. This is just a thought-collection list.
Startup time
I'm retiring issue #30 and reformatting those ideas here:
- Many classes (e.g. productions) have information stored in static lists. This information gets filled in at class-load time. However, SOME of this information is not necessary until (e.g.) an error has occurred and stack traces are being printed. We could add a
debuggingInit
function, and move this static list initialization step until later, when it's needed. (Major culprit: names of attributes.) (Alternative possibility: In future JVMs, class files may gain the ability to store "lists of strings" as constants. So if we just do nothing long enough...) (This fix already is done for terminal lexer classes. This is initialized to null, and then an accessor can fill it in later, in the unlikely event it's needed. Perhaps we could do the same for things on productions, functions, and nonterminals.) - Autocopy requires dynamically examining the types of every production at start-up time to fill in copy equations. This is awful. I believe I had some notes about doing this statically in my PhD thesis, as part of verifying the MWDA was sound. Perhaps we could begin enforcing those rules, and doing it statically instead.
Silver in general
- We could probably pretty desperately use arrays in many locations we currently use lists. Arrays aren't automatically better than lists, but they'll be good probably for anything that gets iterated over more than once. Adding arrays (and good array syntax) to Silver would probably require a fair bit of work, though.
- We need strictness analysis. Even a naive simple approach (i.e. non-dataflow) would be helpful. Perhaps even JUST being able to annotate strictness on critical functions/productions in the standard library.
- Traditional AG caching concerns. e.g. not caching certain attributes, or finding a way to better garbage collect them.
Issue #206 tracks some thoughts on memory usage. That issue should possibly be folded into this one.
Interface files
- See issue #260
- Copper parsing (or rather, probably lexing?) performance could possible use some improvement? These files take a lot longer to parse than the old ones, and likely because it's just injesting 10 times as many bytes, since the new format is less terse.
- I haven't put Copper performance on this list otherwise, but that's something to consider, too.
Compilation process
- Silver builds "all-at-once" instead of building "strongly connected component at a time, in topological sort order." This means we use significantly more memory than we could otherwise manage.
MWDA
- Algorithm is very slow. Likely would benefit from arrays, other small Java-native enhancements.
- More investigation is needed.
- Would also benefit from "strongly connected component" build order: flow inference can be done on smaller parts at a time.
Type checking
- Most of the silver error checking process is, presumably, type checking. So it must be slow.
- Once possible culprit is the inefficient way we do unification and track unification substitutions.
- More investigation is needed.
Generated files
- We generate way too many files. This creates more files to create/delete/jar/etc, and it increasing memory usage. We should find ways to optimize them lower.
- We have some "standard thunks" that we create (see
DecoratedNode.java
), but we should profile thunk generation more to find additional candidates for moving into the runtime. - We should probably have some kinds of standard
Lazy
s too. Here are some examples from real code:
We copy inherited attributes down (manually) a lot:
// ims.grammarName = top.grammarName
childInheritedAttributes[i_ims][Init.grammarName__ON__ImportStmts] =
new common.Lazy() { public final Object eval(final common.DecoratedNode context) {
return ((common.StringCatter)context.inherited(Init.grammarName__ON__Root)); } };
Or the synthesized version, too:
// top.declaredName = gdcl.declaredName
synthesizedAttributes[Init.declaredName__ON__Root] =
new common.Lazy() { public final Object eval(final common.DecoratedNode context) {
return ((common.StringCatter)((common.DecoratedNode)context.childDecorated(Proot.i_gdcl)).synthesized(Init.declaredName__ON__GrammarDcl)); } };
So this would be a simple Lazy to create an abstract version of in the standard library. Then this line could become:
synthesizedAttributes[Init.declaredName__ON__Root] =
new common.Lazy.ChildDecoratedAccess(context, Proot.i_gdcl, Init.declaredName__ON__GrammarDcl);
And spare us one more generated class file.
Some data on generated files:
cd generated/bin
SAMPLE=$(find -name "*.class" | shuf -n 1000)
javap -c $SAMPLE | grep -o -P "(?<=implements |[^\?] extends )[^\{ <>]+" | sort | uniq -c | sort -n
18 common.PatternLazy
19 common.Terminal
23 common.Node
33 common.FunctionNode
99 common.NodeFactory
302 common.Lazy
450 common.Thunk$Evaluable
So a random sample suggest 45% thunks, 30% lazys. That's more lazys than I expected, so putting common cases in the runtime is possibly a better plan than I anticipated.
That 10% are NodeFactory
was surprising to me, but after some napkin math, it seems right. There's one for every production and function. This is a potential candidate for using Java 8 Lambdas, actually... Unlike the nested thunk situation, we're unlikely to make javac
crash unable to deal with the nesting. These are always top-level. WELP, except that I went looking and now we have getType
on them in addition to invoke
, so they're not eligible. Oh well.
Some more data. First: a java heap profile:
java -Xss8M -Xmx3000M -Xrunhprof:heap=sites,depth=2 -jar ../jars/silver.composed.Default.jar --clean silver:composed:Default
Top 50:
SITES BEGIN (ordered by live bytes) Fri Dec 14 23:04:29 2018
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 13.81% 13.81% 189687720 2985093 549203608 8263711 317189 java.lang.Object[]
2 10.44% 24.25% 143434080 2988210 397011888 8271081 317187 common.DecoratedNode
3 9.57% 33.82% 131414784 2737808 848969712 17686869 316605 common.DecoratedNode
4 5.42% 39.24% 74475472 2804573 205202808 7723797 317188 java.lang.Object[]
5 4.95% 44.19% 68060816 2737808 443454016 17686869 316606 java.lang.Object[]
6 2.96% 47.15% 40712456 269429 46784240 318827 300193 char[]
7 2.82% 49.98% 38757664 1154855 100658072 3284503 317238 java.lang.Object[]
8 2.30% 52.28% 31607472 1316978 180117000 7504875 316604 silver.util.Pcontains
9 2.11% 54.39% 29027640 1209485 66991176 2791299 317145 common.AppendCell
10 1.71% 56.10% 23521128 980047 57372648 2390527 317140 common.ConsCell
11 1.35% 57.45% 18532272 1158267 110647792 6915487 318297 silver.util.Pcontains$1
12 1.35% 58.80% 18532272 1158267 110647792 6915487 318298 common.Thunk
13 1.04% 59.84% 14289720 357243 14289720 357243 317092 java.util.TreeMap$Entry
14 1.03% 60.87% 14169528 590397 157010928 6542122 318836 java.lang.Object[]
15 0.88% 61.75% 12061480 470998 26464904 1071136 317529 java.lang.Object[]
16 0.86% 62.61% 11849320 296233 15173600 379340 318112 core.Ploc
17 0.77% 63.38% 10536528 439022 10536528 439022 321211 silver.definition.flow.ast.PconsFlow
18 0.72% 64.10% 9866280 411095 9866280 411095 318834 common.ConsCell
19 0.69% 64.78% 9413600 574 23222400 1416 324900 char[]
20 0.62% 65.40% 8477200 529825 12109424 756839 300005 java.lang.Integer
21 0.60% 66.00% 8283232 150894 19025416 264383 300012 char[]
22 0.53% 66.54% 7333392 305558 53573280 2232220 317185 common.AppendCell$$Lambda$5.389464548
23 0.49% 67.03% 6726240 121066 12029752 161986 317344 java.lang.Object[]
24 0.47% 67.50% 6466296 269429 7651848 318827 300194 java.lang.String
25 0.47% 67.96% 6417912 55480 21748768 170932 300010 char[]
26 0.45% 68.42% 6213528 258897 59545488 2481062 317580 common.DecoratedNode$$Lambda$10.818609427
27 0.42% 68.84% 5811168 121066 7775328 161986 317342 common.DecoratedNode
28 0.40% 69.24% 5523480 120295 6554104 153473 318540 java.lang.Object[]
29 0.39% 69.63% 5344512 222688 12045624 501901 316604 core.monad.PbindEither
30 0.39% 70.02% 5340624 222526 12041736 501739 324983 core.monad.PbindEither$1$3
31 0.39% 70.41% 5340600 222525 12041712 501738 324985 core.monad.PbindEither$1$4
32 0.39% 70.80% 5340600 222525 12041712 501738 324989 java.lang.Object[]
33 0.39% 71.18% 5330400 222100 5330400 222100 325037 common.StringCatter
34 0.39% 71.57% 5315472 221478 10621248 442552 317184 core.Ppair
35 0.37% 71.94% 5075056 177365 7569104 261458 300055 char[]
36 0.36% 72.30% 4888928 305558 35715520 2232220 317186 common.Thunk
37 0.35% 72.65% 4863856 303991 39696992 2481062 317581 common.Thunk
38 0.35% 73.00% 4784040 199335 40579872 1690828 316604 core.Pmap
39 0.35% 73.35% 4753848 198077 5261712 219238 321123 common.StringCatter
40 0.35% 73.69% 4743456 98822 13571328 282736 300009 char[]
41 0.34% 74.03% 4711392 574 11622528 1416 324898 byte[]
42 0.31% 74.34% 4256760 177365 6273768 261407 318106 java.lang.String
43 0.31% 74.65% 4256760 177365 6251328 260472 318110 common.StringCatter
44 0.31% 74.96% 4256760 177365 6251328 260472 318111 common.StringCatter
45 0.27% 75.24% 3739392 155808 86947080 3622795 317453 common.DecoratedNode$$Lambda$8.176320504
46 0.27% 75.50% 3660728 150725 22912536 948092 316613 java.lang.Object[]
47 0.26% 75.77% 3617872 226117 15922288 995143 317149 common.Thunk
48 0.26% 76.03% 3614176 225886 57964720 3622795 317454 common.Thunk
49 0.26% 76.29% 3563008 222688 8030416 501901 324979 core.monad.PbindEither$1
50 0.26% 76.55% 3560560 222535 8029568 501848 317359 core.Pright
Here are some thoughts on the above:
- Silver allocates (and specifically allocates decorated nodes) like there's no tomorrow. So object arrays and DecoratedNodes are to be expected...
- I'm not entirely clear on why
char[]
should be so prevalent. ~~Or HeapCharBuffer, (WTF: HeapCharBuffer is allocated 1.5M times! That ain't right!)~~ Fixed that, and updated the data above. -
silver:util:contains
has 7.5M calls. Granted, it's recursive, but that's a LOT, especially since Silver calls are allocation heavy.map
is up at 1.7M.foldr
is 1.5M.flatMap
300K.implode
200K.lookupBy
380K. These may all be good candidates for special-casing? - Another interesting number from data lower down than is present above:
goodSubst
is allocated 200K times.searchEnvTree
has 700K.performSubstitution
250K.tyVarEqual
350K. - I'm not clear on why
bindEither
is there, much less called 500K times. I thought this stuff wasn't used in the Silver compiler. - There's 127K calls to
skipNulls
which is some weird thing in the driver. That's probably evidence that bit of code is quite suboptimal.
Not sure about most of the other stuff, but bindEither
is used in the reflection library to allow monadic error handling during serialization. Since reflection is now used by Silver for interface files, this I suppose would end up being called quite a lot during interface file generation.
This time, a crummy CPU profile. Please note that these numbers are essentially "time spent with this function at top of stack" so if a function takes a long time, but because it calls other functions, then it could easily not be listed. So this is places where the instruction pointer was "caught" inside that function, not having called others.
java -Xss8M -Xmx3000M -Xrunhprof:cpu=samples,depth=1,interval=1 -jar ../jars/silver.composed.Default.jar --clean silver:composed:Default
CPU SAMPLES BEGIN (total = 9715) Fri Dec 14 21:14:09 2018
rank self accum count trace method
1 9.03% 9.03% 877 300755 common.DecoratedNode.childAsIsLazy
2 8.11% 17.14% 788 300102 java.lang.ClassLoader.defineClass1
3 4.24% 21.38% 412 300942 common.DecoratedNode.evalSyn
4 2.90% 24.28% 282 300994 common.Thunk.eval
5 2.48% 26.76% 241 300693 common.DecoratedNode.childAsIs
6 1.56% 28.33% 152 300122 java.util.zip.Inflater.inflateBytes
7 1.45% 29.78% 141 300941 common.DecoratedNode.createDecoratedChild
8 1.39% 31.17% 135 301125 common.AppendCell.append
9 1.34% 32.51% 130 300143 java.lang.Throwable.fillInStackTrace
10 1.28% 33.78% 124 300679 java.io.FileOutputStream.writeBytes
11 1.12% 34.90% 109 300982 silver.util.Pcontains.invoke
12 1.01% 35.91% 98 301061 common.Node.decorate
13 1.00% 36.91% 97 300149 java.util.zip.ZipFile.getEntry
14 0.93% 37.84% 90 300664 edu.umn.cs.melt.copper.runtime.engines.single.SingleDFAEngine.runEngine
15 0.82% 38.66% 80 300954 core.Pnull.invoke
16 0.78% 39.44% 76 301431 common.DecoratedNode.evalSyn
17 0.74% 40.19% 72 300717 edu.umn.cs.melt.copper.runtime.engines.single.SingleDFAEngine.simpleScan
18 0.71% 40.90% 69 300916 common.DecoratedNode.evalLocalAsIs
19 0.67% 41.56% 65 301192 java.util.TreeMap.getEntryUsingComparator
20 0.64% 42.20% 62 300929 common.AppendCell.head
21 0.63% 42.83% 61 302857 sun.nio.cs.UTF_8$Encoder.encodeLoop
22 0.62% 43.45% 60 301388 common.DecoratedNode.createDecoratedChild
23 0.60% 44.05% 58 300115 java.lang.Object.<init>
24 0.58% 44.62% 56 300938 common.Thunk.transformUndecorate
25 0.57% 45.19% 55 300109 java.lang.ClassLoader.findBootstrapClass
26 0.55% 45.73% 53 300921 common.FunctionNode.decorate
27 0.51% 46.25% 50 300113 java.util.zip.ZipFile.read
28 0.51% 46.76% 50 301046 common.DecoratedNode.lambda$childDecoratedSynthesizedLazy$3
29 0.49% 47.26% 48 301075 common.Node.decorate
30 0.45% 47.71% 44 301235 core.PcompareString.invoke
31 0.43% 48.14% 42 301238 core.PcompareString.invoke
32 0.42% 48.56% 41 300162 java.lang.ClassLoader.loadClass
33 0.42% 48.99% 41 301191 java.util.TreeMap.put
34 0.42% 49.41% 41 302879 java.io.FileOutputStream.open0
35 0.38% 49.79% 37 300774 common.Util.demand
36 0.37% 50.16% 36 300384 java.lang.Object.clone
37 0.37% 50.53% 36 301186 common.Node.decorate
38 0.34% 50.87% 33 300053 java.security.AccessController.doPrivileged
39 0.30% 51.17% 29 300382 java.lang.Class.getField
40 0.28% 51.45% 27 300142 java.lang.ClassLoader.loadClass
41 0.28% 51.72% 27 300144 java.net.URLClassLoader.findClass
42 0.27% 51.99% 26 301651 common.DecoratedNode.inheritedForwarded
43 0.27% 52.26% 26 302822 java.io.UnixFileSystem.delete0
44 0.26% 52.52% 25 301600 common.DecoratedNode.childAsIsSynthesizedLazy
45 0.25% 52.76% 24 300230 java.lang.ClassLoader.findLoadedClass0
46 0.25% 53.01% 24 301340 common.Node.decorate
47 0.23% 53.24% 22 301220 silver.definition.env.PfreshenNamedSignature$1.eval
48 0.17% 53.41% 17 303048 core.monad.PbindEither.invoke
49 0.16% 53.58% 16 301271 common.Node.decorate
50 0.16% 53.74% 16 301346 common.DecoratedNode.lambda$childDecoratedLazy$0
Thoughts:
- We pay a pretty high runtime price for having a lot of class files: that's all the ClassLoader, and zip inflator and so on.
- REALLY not sure why
fillInStackTrace
is showing up there. That seems to suggest we spend a lot of time throwing and catching exceptions? That shouldn't be happening? - I think
FileOutputStream.writeByets
is a phantom: probably that blocks while writing, and so garnered more CPU profile samples than it really deserved. - On the one hand,
null
shows up a lot, so you might think "lol null is slow," but that's because it gets called uniquely a lot, and... it amounts to 0.8% of runtime, so, not really an optimization opportunity, there. - This data is pretty useless. Hah.
Ok. A much more useful kind of CPU profile. CPU profiles run fast, while heap are slow, but the aggregate data we get isn't helpful at the end. But the profiling SITES... with deep stack traces, when we're trying to track things down:
java -Xss8M -Xmx3000M -Xrunhprof:cpu=samples,depth=12 -jar ../jars/silver.composed.Default.jar --clean silver:composed:Default
Get us some interesting information about when certain things are getting called:
TRACE 301014:
java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown line)
java.lang.Throwable.fillInStackTrace(Throwable.java:783)
java.lang.Throwable.<init>(Throwable.java:265)
java.lang.Exception.<init>(Exception.java:66)
java.lang.RuntimeException.<init>(RuntimeException.java:62)
common.exceptions.SilverException.<init>(SilverException.java:29)
common.exceptions.SilverError.<init>(SilverError.java:12)
common.Reflection.reify(Reflection.java:254)
common.Reflection.reifyChecked(Reflection.java:122)
silver.extension.silverconstruction.Init$16$1$2$1.invoke(Init.java:265)
silver.extension.silverconstruction.Init$16$1$2.eval(Init.java:273)
silver.extension.silverconstruction.Init$16$1.eval(Init.java:273)
TRACE 300211:
java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown line)
java.lang.Throwable.fillInStackTrace(Throwable.java:783)
java.lang.Throwable.<init>(Throwable.java:265)
java.lang.Exception.<init>(Exception.java:66)
java.lang.ReflectiveOperationException.<init>(ReflectiveOperationException.java:56)
java.lang.NoSuchFieldException.<init>(NoSuchFieldException.java:50)
java.lang.Class.getField(Class.java:1703)
common.Decorator.decorateAutoCopy(Decorator.java:60)
silver.definition.env.DcompiledGrammars.decorate(DcompiledGrammars.java:10)
common.Decorator.applyDecorators(Decorator.java:22)
silver.modification.collection.Init.postInit(Init.java:87)
silver.analysis.warnings.flow.Init.postInit(Init.java:77)
TRACE 300575:
java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown line)
java.lang.Throwable.fillInStackTrace(Throwable.java:783)
java.lang.Throwable.<init>(Throwable.java:287)
java.lang.Exception.<init>(Exception.java:84)
java.lang.ReflectiveOperationException.<init>(ReflectiveOperationException.java:75)
java.lang.ClassNotFoundException.<init>(ClassNotFoundException.java:82)
java.net.URLClassLoader.findClass(URLClassLoader.java:382)
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
java.lang.ClassLoader.loadClass(ClassLoader.java:411)
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
silver.definition.env.PtoEnv$1.eval(PtoEnv.java:92) (A WHOLE BUNCH WHERE THIS LAST LINE VARIES)
So that's three sources:
- Autocopy decoration setup, where long ago I made a bad decision that we'd just try something, catch the exception, and move on.
- Reification? Not sure why that's happening? Shouldn't it be the case that anything we reify does so successfully? That line number is this code:
if (!TypeRep.unify(resultType, givenType)) { throw new SilverError("reify is constructing " + resultType.toString() + ", but found " + givenType.toString() + " AST."); }
Actually, this is a clean run of the silver compiler. Shouldn't we be reifying nothing at all? - Apparently java internally catches exceptions as part of it's class loading process. This seems like a very bad no-good terrible idea, but also we have no control over it. Another way we pay a price for having a lot of class files, though! Yikes!
Also I've tracked down the HeapCharBuffer problem to an obvious bug caused (presumably) by misleading documentation. Pursuing this one immediately...
Interesting, I'll take a look at what's going on with reification.
OK, I remember what is going on here. There is also the silver construction extension that I added, that is basically the same as silver-ableC but for Silver itself, for use by other Silver extensions. I think it's only used in a couple of places, at the moment. Part of the translation involves being able to extract the location at any point in an AST
. In ableC there are a few productions on nonterminals lacking a location annotation that have Location
as the rightmost child, so I made this into a general check that works by attempting to reify the last child and failing if it isn't actually a Location
. This got copied as part of the silver construction extension, but seeing as there aren't any such productions in Silver's abstract syntax, this check can be removed.