scala-native icon indicating copy to clipboard operation
scala-native copied to clipboard

opt: sig anyval

Open coreyoconnor opened this issue 2 months ago • 9 comments

Adopted from comment and PR

  • https://github.com/scala-native/scala-native/pull/1549/files#r277257527

Using VM version: JDK 21.0.8, OpenJDK 64-Bit Server VM, 21.0.8+9-nixos

Based purely on the JMH bench results this does not have a measurable impact. Leaving as draft to answer conversation.

tools-benchmark/jmh:run -i 20 scala.scalanative.benchmarks.LinkerBench

Prior

[info] Result "scala.scalanative.benchmarks.LinkerBench.link":
[info]   1126.626 ±(99.9%) 55.782 ms/op [Average]
[info]   (min, avg, max) = (1068.876, 1126.626, 1376.822), stdev = 64.238
[info]   CI (99.9%): [1070.844, 1182.408] (assumes normal distribution)
[info] # Run complete. Total time: 00:01:03

[info] Result "scala.scalanative.benchmarks.LinkerBench.link":
[info]   1126.626 ±(99.9%) 55.782 ms/op [Average]
[info]   (min, avg, max) = (1068.876, 1126.626, 1376.822), stdev = 64.238
[info]   CI (99.9%): [1070.844, 1182.408] (assumes normal distribution)
[info] # Run complete. Total time: 00:01:03

Post

[info] Result "scala.scalanative.benchmarks.LinkerBench.link":
[info]   1166.456 ±(99.9%) 142.858 ms/op [Average]
[info]   (min, avg, max) = (1049.574, 1166.456, 1521.616), stdev = 164.516
[info]   CI (99.9%): [1023.598, 1309.314] (assumes normal distribution)
[info] # Run complete. Total time: 00:01:03
[info] Result "scala.scalanative.benchmarks.LinkerBench.link":
[info]   1113.939 ±(99.9%) 107.734 ms/op [Average]
[info]   (min, avg, max) = (1026.294, 1113.939, 1468.061), stdev = 124.066
[info]   CI (99.9%): [1006.206, 1221.673] (assumes normal distribution)
[info] # Run complete. Total time: 00:01:00

coreyoconnor avatar Nov 01 '25 05:11 coreyoconnor

coreyoconnor.

Could you take a moment to explain to a interested reader who is not a subject expert a few things? I appreciate 3 minutes of your time. Thank you.

It appear that you are trying to speed up the SN link phase by implementing a suggestion by Densh. Such a speed up would certainly be a good thing. It certainly would save me time for I use that phase many times a day.

These are probably perfectly obvious to you.

  1. Exactly what is being measured?

  2. The answer is probably in the supplied "scala.scalanative.benchmarks.LinkerBench", but that is not a link and I tracking it down would take more time than available.

  3. What is the direction of goodness? Are larger numbers better?

From what I see, the bottom set has a greater dispersion and a mean that is greater in one case and less in another. Hard to see a win either direction of goodness.

LeeTibbert avatar Nov 01 '25 12:11 LeeTibbert

Definitely makes sense but also breaks binary compatibility. Would need to wait for the next minor release.

WojciechMazur avatar Nov 02 '25 12:11 WojciechMazur

I'm attempting to gather more info on allocations and actual profiling info.

Unfortunately my computer hard locks when i try to profile anything! Not what i wanted to spend my time on haha.

Anyways, I'll be updating this with further analysis once my computer can handle it.

coreyoconnor avatar Nov 02 '25 18:11 coreyoconnor

@LeeTibbert I'll be sure to answer your other questions eventually. :) I'm still building a suitable test and evidence of this and other "optimizations"

But first:

2. The answer is probably in the supplied "scala.scalanative.benchmarks.LinkerBench",
   but that is not a link and I tracking it down would take more time than available.

Not sure what you mean here: "not a link"?

the file is

  • https://github.com/scala-native/scala-native/blob/c9a79dec2882ada1eb1e3bffb86018f3d93c2b84/tools-benchmarks/src/main/scala/scala/scalanative/benchmarks/testinterface/LinkerBench.scala#L21

Which (as I understand) performs the link phase of the TestMain executable. Does not seem very expensive tho: Most of the time is spend doing IO. So I wonder if there is a better "link stress test" I can build.

coreyoconnor avatar Nov 06 '25 22:11 coreyoconnor

I'm attempting to gather more info on allocations and actual profiling info.

Unfortunately my computer hard locks when i try to profile anything! Not what i wanted to spend my time on haha.

Anyways, I'll be updating this with further analysis once my computer can handle it.

For the search daemons: The linux kernel nmi watchdog was to blame. Which I disabled with:

echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog

coreyoconnor avatar Nov 06 '25 22:11 coreyoconnor

| I'll be sure to answer your other questions eventually. :) I'm still building a suitable test and evidence of this and other | "optimizations"

Understood. "not soup yet" No rush here. I have recent personal experience with the peanut butter of available time being spread pretty thin.

|Not sure what you mean here: "not a link"?

When I mouse clicked on the blue apparent link in the base topic, nothing happened. When I click on the blue text in your last reply it does indeed take me to a page with code. Thank you.

If make progress on the task currently at the front of my priority queue, I'll take a look tomorrowish.

I wish you good progress.

LeeTibbert avatar Nov 06 '25 23:11 LeeTibbert

One challenge I'm having here is splitting IO from the link compute perf measurements. I'd like to use the in-memory class loader to narrow down to only the linking computation (plus output). Otherwise the perf measurements are dominated by IO.

Which could probably improve but trying to focus on one bit at a time.

coreyoconnor avatar Nov 08 '25 22:11 coreyoconnor

Based on my benchmarking this is a small portion of total time (< 10%). I'm going to focus on other parts of the linking process. Something else is dominating hte performance.

coreyoconnor avatar Nov 29 '25 17:11 coreyoconnor

As an individual member of the community, thank you for chasing this.

LeeTibbert avatar Nov 29 '25 22:11 LeeTibbert