Profiling of LatticeAccess
I ran profiling for the TCLB. Special interest is LatticeAccess.cpp.Rt parsing time. Full report for d2q9 and d3q27 models: https://gist.github.com/shkodm/607d98a0dfe3c244a1197016f4b125d7 Top list of compilation-time eaters for d3q27. Top of d2q9 is pretty much similar.
| function name | total.time | total.pct | self.time | self.pct |
|---|---|---|---|---|
| "<Anonymous>" | 145.46 | 100.00 | 2.42 | 1.66 |
| "eval" | 145.46 | 100.00 | 2.00 | 1.37 |
| "lapply" | 145.20 | 99.82 | 1.60 | 1.10 |
| "FUN" | 145.18 | 99.81 | 1.42 | 0.98 |
| "txtlapply" | 142.20 | 97.76 | 0.00 | 0.00 |
| "field.access" | 140.80 | 96.80 | 0.00 | 0.00 |
| "Ops.gvector.apply" | 114.84 | 78.95 | 0.12 | 0.08 |
| "fun" | 110.98 | 76.30 | 0.60 | 0.41 |
| "offset.p" | 95.22 | 65.46 | 0.06 | 0.04 |
| "load.field" | 91.98 | 63.23 | 0.00 | 0.00 |
| "aggregate" | 70.38 | 48.38 | 0.56 | 0.38 |
| "aggregate.pAlg" | 70.06 | 48.16 | 1.20 | 0.82 |
| "*" | 67.22 | 46.21 | 0.04 | 0.03 |
| "*.pAlg" | 65.74 | 45.19 | 1.80 | 1.24 |
| "save.field" | 48.82 | 33.56 | 0.00 | 0.00 |
| "aggregate.data.frame" | 48.46 | 33.32 | 1.90 | 1.31 |
| "+" | 40.68 | 27.97 | 0.16 | 0.11 |
| "[" | 33.28 | 22.88 | 1.22 | 0.84 |
| "+.pAlg" | 32.32 | 22.22 | 0.10 | 0.07 |
| "C" | 30.54 | 21.00 | 0.02 | 0.01 |
| "ToC" | 30.42 | 20.91 | 0.04 | 0.03 |
Any ideas about what can be done with it?
@shkodm when you change to the new polyAlgebra package, does the "ToC" time decreases?
@llaniewski , yes, as far as I remember, it becomes at least 5 times smaller. I will report exact results of profiling with the new package in the evening.
With new polyAlgebra package, as requested:
| function name | total.time | total.pct | self.time | self.pct |
|---|---|---|---|---|
| "Anonymous" | 127.28 | 100.00 | 1.90 | 1.49 |
| "eval" | 127.28 | 100.00 | 1.40 | 1.10 |
| "FUN" | 126.88 | 99.69 | 1.22 | 0.96 |
| "lapply" | 126.86 | 99.67 | 1.64 | 1.29 |
| "txtlapply" | 123.92 | 97.36 | 0.00 | 0.00 |
| "field.access" | 122.92 | 96.57 | 0.02 | 0.02 |
| "Ops.gvector.apply" | 118.40 | 93.02 | 0.10 | 0.08 |
| "fun" | 114.12 | 89.66 | 0.50 | 0.39 |
| "offset.p" | 98.66 | 77.51 | 0.00 | 0.00 |
| "load.field" | 79.40 | 62.38 | 0.00 | 0.00 |
| "aggregate" | 72.04 | 56.60 | 0.42 | 0.33 |
| "aggregate.pAlg" | 71.82 | 56.43 | 1.42 | 1.12 |
| "*" | 68.00 | 53.43 | 0.06 | 0.05 |
| "*.pAlg" | 66.38 | 52.15 | 1.62 | 1.27 |
| "aggregate.data.frame" | 49.14 | 38.61 | 1.72 | 1.35 |
| "save.field" | 43.52 | 34.19 | 0.00 | 0.00 |
| "+" | 43.14 | 33.89 | 0.16 | 0.13 |
| "+.pAlg" | 34.06 | 26.76 | 0.18 | 0.14 |
| "[" | 28.72 | 22.56 | 0.92 | 0.72 |
| "[.data.frame" | 24.94 | 19.59 | 7.64 | 6.00 |
| "-" | 21.82 | 17.14 | 0.14 | 0.11 |
| "-.pAlg" | 19.66 | 15.45 | 0.08 | 0.06 |
| "data.frame" | 16.76 | 13.17 | 2.78 | 2.18 |
| "match" | 16.36 | 12.85 | 9.66 | 7.59 |
| "as.factor" | 15.14 | 11.90 | 0.26 | 0.20 |
| "%in%" | 14.96 | 11.75 | 3.10 | 2.44 |
| "rbind" | 14.04 | 11.03 | 3.68 | 2.89 |
| "factor" | 13.68 | 10.75 | 1.58 | 1.24 |
| "pAlg" | 11.74 | 9.22 | 0.52 | 0.41 |
| "deparse" | 11.48 | 9.02 | 2.80 | 2.20 |
| "as.data.frame" | 10.42 | 8.19 | 0.94 | 0.74 |
| "match.arg" | 10.02 | 7.87 | 1.04 | 0.82 |
| "[[" | 9.48 | 7.45 | 1.20 | 0.94 |
| "new" | 9.02 | 7.09 | 0.06 | 0.05 |
| "initialize" | 8.84 | 6.95 | 0.90 | 0.71 |
| "standardGeneric" | 8.78 | 6.90 | 0.26 | 0.20 |
| "C" | 8.60 | 6.76 | 0.00 | 0.00 |
| "gsub" | 8.56 | 6.73 | 1.66 | 1.30 |
| "ToC" | 8.48 | 6.66 | 0.04 | 0.03 |
@shkodm Can you post profiling "script"? Something like profile.sh d2q9 or make profile d2q9 would be nice ;)
I think you should rather look at self time: it should point to biggest eater, total time includes sub-calls, so main has always all of it You may want to look at getting rid of or reducing number of call to top few of those. This list represents more than half of all script time.
| function name | total.time | total.pct | self.time | self.pct | time so far |
|---|---|---|---|---|---|
| match | 18,2 | 12,51 | 10,28 | 7,07 | 10,28 |
| [.data.frame | 29,58 | 20,34 | 8,56 | 5,88 | 18,84 |
| data.frame | 25,12 | 17,27 | 3,94 | 2,71 | 22,78 |
| names | 3,86 | 2,65 | 3,86 | 2,65 | 26,64 |
| length | 3,76 | 2,58 | 3,76 | 2,58 | 30,4 |
| %in% | 17,4 | 11,96 | 3,7 | 2,54 | 34,1 |
| rbind | 18,6 | 12,79 | 3,54 | 2,43 | 37,64 |
| deparse | 14 | 9,62 | 3,5 | 2,41 | 41,14 |
| anyDuplicated | 4,56 | 3,13 | 3,46 | 2,38 | 44,6 |
| [[.data.frame | 10,9 | 7,49 | 3,28 | 2,25 | 47,88 |
| c | 3,14 | 2,16 | 2,86 | 1,97 | 50,74 |
| [<-.data.frame | 7,56 | 5,2 | 2,54 | 1,75 | 53,28 |
| Anonymous | 145,46 | 100 | 2,42 | 1,66 | 55,7 |
| .deparseOpts | 7,16 | 4,92 | 2,3 | 1,58 | 58 |
| unique | 4,64 | 3,19 | 2,22 | 1,53 | 60,22 |
| eval | 145,46 | 100 | 2 | 1,37 | 62,22 |
| gsub | 8,98 | 6,17 | 2 | 1,37 | 64,22 |
| pmatch | 2,92 | 2,01 | 1,94 | 1,33 | 66,16 |
| aggregate.data.frame | 48,46 | 33,32 | 1,9 | 1,31 | 68,06 |
| dim | 5,22 | 3,59 | 1,9 | 1,31 | 69,96 |
| paste | 9,7 | 6,67 | 1,88 | 1,29 | 71,84 |
| *.pAlg | 65,74 | 45,19 | 1,8 | 1,24 | 73,64 |
| factor | 13,58 | 9,34 | 1,76 | 1,21 | 75,4 |
| [[ | 12,6 | 8,66 | 1,7 | 1,17 | 77,1 |
| unlist | 2,48 | 1,7 | 1,68 | 1,15 | 78,78 |
| is.factor | 1,68 | 1,15 | 1,68 | 1,15 | 80,46 |
| lapply | 145,2 | 99,82 | 1,6 | 1,1 | 82,06 |
| unique.default | 2,08 | 1,43 | 1,6 | 1,1 | 83,66 |
| prettyNum | 1,78 | 1,22 | 1,56 | 1,07 | 85,22 |
| finish.pAlg | 2,82 | 1,94 | 1,54 | 1,06 | 86,76 |
| as.character | 1,44 | 0,99 | 1,44 | 0,99 | 88,2 |
| sys.call | 1,44 | 0,99 | 1,44 | 0,99 | 89,64 |
| FUN | 145,18 | 99,81 | 1,42 | 0,98 | 91,06 |
| as.data.frame | 14,88 | 10,23 | 1,42 | 0,98 | 92,48 |
| as.list | 2,38 | 1,64 | 1,38 | 0,95 | 93,86 |
| [ | 33,28 | 22,88 | 1,22 | 0,84 | 95,08 |
| .row_names_info | 1,22 | 0,84 | 1,22 | 0,84 | 96,3 |
@llaniewski Can we divide parts of LatticeAcces/Dynamics.R to get little bit more fine grained dependencies and avoid recompilation of LatticeAccess?
@mdzik , there is no script really, I just enabled profiling in RTemplate in the makefile.main.Rt & compiled.
Can you introduce make flag? or configure option?