Reduce Allocations in Text Rendering
What does the pull request do?
This PR addresses some performance regressions and reduces overall allocations when rendering a lot of (complex) text runs.
What is the current behavior?
When I upgraded to the latest version of Avalonia and .NET 10 in AvaloniaHex, I noticed a pretty significant bump in memory allocations and performance degradation in rendering of many (complex) text runs. Upon profiling, I found a couple of related hotspots:
-
In
Avalonia.Media.TextFormattingthere are various precompiled tries stored as raw data. This data is recreated on every access of the trie. E.g., here isUnicodeData.trie:https://github.com/AvaloniaUI/Avalonia/blob/0b5a82884e2cfa24499f5f05fc84e0dd52c734dd/src/Avalonia.Base/Media/TextFormatting/Unicode/UnicodeData.trie.cs#L15-L22
These computed tries are used quite extensively throughout the library. This results in a significant number of unnecessary allocations of very large data blobs (literally millions of instances), significantly slowing down controls that do a lot of (complex) text rendering. Below, an example of how AvaloniaHex is affected when scrolling once down and up in the example project:
-
FontFamily.Parseeventually callsFontFamily.GetFontSourceIdentifier, which always allocates extrastringandstring[]instances when parsing a font by name, even if the font name is a simple font without any fallback fonts: -
FontShaperImpl.ShapeTextinAvalonia.Skiauses a language cache with aGetOrAddconstruction that uses a non-static / closure capturing lambda for its factory.https://github.com/AvaloniaUI/Avalonia/blob/0b5a82884e2cfa24499f5f05fc84e0dd52c734dd/src/Skia/Avalonia.Skia/TextShaperImpl.cs#L45-L47
This results in a closure being created for every text run that isn't used most of the time because the used culture doesn't change in most cases:
-
LineBreakEnumeratoralways creates aLineBreakStateheap allocation (even if a text run does not contain line breaks). This is wasteful, especially consideringLineBreakEnumeratoris already aref structand will never appear on the heap:
What is the updated/expected behavior with this PR?
This PR removes the vast majority of these allocations. Running the example project of AvaloniaHex with these changes applied to Avalonia makes scrolling much smoother.
How was the solution implemented (if it's not obvious)?
In chronological order of the issues described above:
UnicodeDataGeneratorwas updated to generate code with a readonlyDatafield as opposed to a computed property. This removes allRuntimeFieldInfoStuballocations.FontFamily.GetFontSourceIdentifierwas rewritten to avoid any array allocations and most string reallocations usingReadOnlySpan<char>s and slicing.FontShaperImpl.ShapeTextnow uses the overload ofGetOrAddthat passes an argument to the factory lambda.LineBreakEnumerator+LineBreakStatewas turned into aref structand is now passed along as arefparameter to all unicode rule methods.
Checklist
- [ ] Added unit tests (if possible)?
- [ ] Added XML documentation to any related classes?
- [ ] Consider submitting a PR to https://github.com/AvaloniaUI/avalonia-docs with user documentation
Breaking changes
None. All changes are made on internal or private APIs.
Obsoletions / Deprecations
None
Fixed issues
Related to #16390
Additional Questions
Maybe out of this scope for this PR, but I am still seeing a lot of instances of SKFont (scaling linearly with the number of text runs I create) in AvaloniaHex, even though I reuse the same Typeface instances as much as possible. Is there any possibility/talks on caching SKFont instances?
You can test this PR using the following package version. 12.0.999-cibuild0060453-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]
We can try to reuse one SKFont instance and change its properties before we call some API that needs it. Not sure how costly mutating it is. If that isn't improving anything, we can cache SKFont instances per font size.
Thank you for your contribution
We can try to reuse one SKFont instance and change its properties before we call some API that needs it.
I am not sure this would help much, because every GlyphRunImpl at the moment creates a new SKFont. So even if we share typefaces or only slightly change properties of some public Avalonia text/font-related types, it would still be recreating a SKFont.
we can cache SKFont instances per font size.
This is what I was thinking as well, though, we would also need to cache it by edging type. Most of the instances seem to come from this method:
https://github.com/AvaloniaUI/Avalonia/blob/0b5a82884e2cfa24499f5f05fc84e0dd52c734dd/src/Skia/Avalonia.Skia/GlyphRunImpl.cs#L135-L144
I am not entirely sure what the best approach would be, do you think we should have GlyphTypefaceImpl cache them?
Thank you for your contribution!
I'll review and test in depth when I have a bit more time, but the first point seems very strange to me. Which OS, architecture and exact runtime are you using?
For a few versions of the C# compiler now, ReadOnlySpan<T> of primitive types with constant values are actually embedded directly inside the assembly. You get a simple pointer to the static data at runtime, without having to allocate heap memory at all. Said another way, new[] doesn't allocate in this case (nowadays, collection expressions should be used to make that more obvious). #15074 implemented this.
A quick check with https://godbolt.org/z/KP1xnE5Yj shows that this hasn't changed at all and still works as expected in .NET 10. I'll look at Avalonia in details as soon as I can :)
We can try to reuse one SKFont instance and change its properties before we call some API that needs it.
I am not sure this would help much, because every
GlyphRunImplat the moment creates a newSKFont. So even if we share typefaces or only slightly change properties of some public Avalonia text/font-related types, it would still be recreating aSKFont.we can cache SKFont instances per font size.
This is what I was thinking as well, though, we would also need to cache it by
edgingtype. Most of the instances seem to come from this method:https://github.com/AvaloniaUI/Avalonia/blob/0b5a82884e2cfa24499f5f05fc84e0dd52c734dd/src/Skia/Avalonia.Skia/GlyphRunImpl.cs#L135-L144
I am not entirely sure what the best approach would be, do you think we should have
GlyphTypefaceImplcache them?
Yes, GlyphTypefaceImpl would cache them
I'll review and test in depth when I have a bit more time, but the first point seems very strange to me. Which OS, architecture and exact runtime are you using?
Yes, I also found it quite strange and something that probably would've been caught by you guys already.
- Arch: x64
- OS: NixOS 25.11/unstable (running in an FHS devshell with x11 libraries in PATH), Kernel 6.12.58
- WM: Hyprland 0.52.1
- Editor: JetBrains Rider 2025.3.
dotnet: 10.0.100 (but also have other versions installed)
After posting the PR I double-checked my build configs and it seemed I ran my tests under the DEBUG config ( Sharplab seems to confirm this too.). My bad, I should've run all tests in RELEASE mode. Nonetheless, this change may still be worth it for speeding up debug builds :), also the other issues are still present even in release mode.
We can try to reuse one SKFont instance and change its properties before we call some API that needs it.
Please, avoid native objects with mutable state in IGlyphRun and friends. Those can be used from multiple threads and it's really easy to introduce hard to track native memory corruption.
We can try to reuse one SKFont instance and change its properties before we call some API that needs it.
Please, avoid native objects with mutable state in IGlyphRun and friends. Those can be used from multiple threads and it's really easy to introduce hard to track native memory corruption.
So you suggest we can't cache the SkFont in the GlyphTypefaceImpl that already holds a SKTypeface?
IGlyphRunImpl is immutable
My bad, I should've run all tests in RELEASE mode. Nonetheless, this change may still be worth it for speeding up debug builds :), also the other issues are still present even in release mode.
Even without any performance boost, it's worth doing this solely to prevent other developers from seeing the same alarming number of allocations that you did and wasting their time trying to investigate the source.
Can we not get the best of both worlds like this?
private static ReadOnlySpan<uint> Data { get; } = new uint[] ...
I assume that this would still trigger the compiler optimisation, and it definitely avoids those million+ array allocations at runtime.
My bad, I should've run all tests in RELEASE mode. Nonetheless, this change may still be worth it for speeding up debug builds :), also the other issues are still present even in release mode.
Even without any performance boost, it's worth doing this solely to prevent other developers from seeing the same alarming number of allocations that you did and wasting their time trying to investigate the source.
Can we not get the best of both worlds like this?
private static ReadOnlySpan<uint> Data { get; } = new uint[] ...I assume that this would still trigger the compiler optimisation, and it definitely avoids those million+ array allocations at runtime.
Sadly, this is not possible because fields (and by extension, property backing fields) cannot be of type ReadOnlySpan<T> unless it is an instance field of a ref struct. This is why I changed it to a uint[]. Happy to hear other options though that could get rid of the single array allocation.
We need to keep ReadOnlySpan<uint> Data => new uint[] to get the optimization
Since getter-only property ("=>") doesn't have any state (that could hypothetically be mutated with reflection), it's easier for the compiler to assume optimizations.
But I don't know if .NET 10 is better at optimizing { get; }.
Note regarding the ReadOnlySpan<T>: even in debug mode, I don't see those allocations at all (I was very surprised at the original claim since I remember running dotMemory on debug builds several times). Tried on Windows x64 and macOS ARM64 with the latest master branch. This is a JIT intrinsic so I'm not sure why that happens on your machine.
Note regarding the
ReadOnlySpan<T>: even in debug mode, I don't see those allocations at all (I was very surprised at the original claim since I remember running dotMemory on debug builds several times). Tried on Windows x64 and macOS ARM64 with the latestmasterbranch. This is a JIT intrinsic so I'm not sure why that happens on your machine.
I have done some additional validation on DEBUG builds, this time on a fresh Ubuntu 25.04 VM, as well as a Windows 10 x64 VM, both using a fresh .NET 10 installation (obtained through preview ppa on Ubuntu and winget on Windows). I am still seeing the same allocations of System.RuntimeFieldInfoStub across all systems when using computed ReadOnlySpan properties.
How to reproduce:
-
Compile this code for .NET 10:
Program.cs
using System; using System.Threading; using System.Runtime.CompilerServices; internal class Program { public static void Main(string[] args) { Thread.Sleep(5000); // Added to give some time to enable full memory allocation tracking var random = new Random(); for (int i = 0; i < 1000000; i++) { DoSomething(Foo[random.Next(Foo.Length)]); } Console.WriteLine("Done"); Thread.Sleep(10000); // Added to give some time to create a snapshot } [MethodImpl(MethodImplOptions.NoInlining)] private static void DoSomething(uint u) { } public static ReadOnlySpan<uint> Foo => new uint[] { 1, 2, 3, 4 }; public static ReadOnlySpan<uint> Bar => [1, 2, 3, 4]; }Program.csproj
<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net10.0</TargetFramework> <ImplicitUsings>enable</ImplicitUsings> <Nullable>enable</Nullable> </PropertyGroup> </Project> -
Start the dotMemory CLI tool.
$ dotMemory start path/to/ProgramZ:\> dotMemory.exe start path\to\Program.exe -
Enable full memory tracking during the first
Thread.Sleepcall:##dotMemory["collect-allocations-on", {pid: xxxx}] -
Create snapshot during the second
Thread.Sleepcall:##dotMemory["get-snapshot", {pid: xxxx}] -
Open the snapshot in dotMemory. Observe the majority of allocations are dominated by
System.RuntimeFieldInfoStub:Allocated type : System.RuntimeFieldInfoStub Objects : n/a Bytes : 144000000 Allocated by 100% FromPtr • 137.33 MB / 137.33 MB • System.RuntimeFieldInfoStub.FromPtr(IntPtr) 100% get_Foo • 137.33 MB / - • global::Program.get_Foo() 100% Main • 137.33 MB / - • global::Program.Main(String[]) ► 100% [AllThreadsRoot] • 137.33 MB / - • [AllThreadsRoot]
I am happy to revert the changes on ReadOnlySpan<T> for the tries in this PR, but this seems to be a reliably reproducible hotspot. Arguably, this may not be necessarily related to Avalonia, and we may want to move this specific issue to the dotnet/runtime repo tosee what they have to say about it. Let me know what you think :).
EDIT: Dumping the generated x64 code using DOTNET_JitDisasm environment variable also confirms that the get_Foo property does a whole lot more on DEBUG builds than simply returning a static handle to the RVA data.
Yes you're right, not sure how I missed that last time, or if I wasn't looking at the right thing, sorry about that.
While I still think that keeping things as they are for simplicity is fine (the runtime has tons of ReadOnlySpan<...> => [] usages) and that profiling should only be done in release mode, let's make a change to keep memory allocations low in debug.
Let's generate something like this instead:
#if DEBUG
public static ReadOnlySpan<uint> Bar => s_bar;
private static uint[] s_bar =
#else
public static ReadOnlySpan<uint> Bar =>
#endif
[1, 2, 3, 4];
You can test this PR using the following package version. 12.0.999-cibuild0060733-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]