[DataFormats.cs] Replace ArrayList with List<DataFormat> and refactor the class
Description
DataFormats is one of those classes that still use ArrayList with merely a single data type — DataFormat. Is a small static class so while at it I refactored the whole class to become more readable and future-friendly if there's such a thing with WPF.
There are no functional changes besides removing the Ordinal -> OrdinalIgnoreCase compare and just using case-insensitive one by default as in .NET 8 this path is vectorized and I believe decreasing the code-size outweights the now immeasurable benefit of two subsequent compares (not to mention, should such a case occur, this solution will come on top) as this isn't a hot path regularly.
While refactoring, I've also removed obsolete CAS remarks and fixed some grammar mistakes.
As a last step, I removed the unnecessary double lock on InternalGetDataFormat/GetDataFormat methods by creating an internal static class, running the initialization of predefined formats in the static constructor, saving all the unnecessary overhead.
Customer Impact
This will increase performance by about 1/5 and additional (up to 4/5) increase can be observed should you query data formats with wrong casing input (which shall not be the case of WPF itself but if used externally, it might).
Regression
Almost none, as noted, the overall performance has been greatly increased and the code-size has decreased. The only "regression" is removing the case-sensitive compare first but overall it is still way faster.
Risk
I believe there are none as the static collection is used internally as a storage and access is done via defined GetDataFormat overloads.
Microsoft Reviewers: Open in CodeFlow
@singhashish-wpf there is a lot of code style changes departing from the current codebase. Perhaps we should have some rules about what is expected going forward.
@h3xds1nz Can you provide any numbers demonstrating the performance/size impact? How did you test the changes?
The last change with static constructor would penalize anyone just using the static properties which I would think are the most common usage of this class.
@miloush I believe that WPF should move forward in the future while not introducing any breaking-changes. I'm actually testing waters to be entirely honest. As for code style, I'm happy to adapt if we get proper guidelines.
As for the static constructor, I do agree with you it will "penalize", but I view it as very cheap. It is a point for discussion though, I do not feel strongly about it, I'm happy to remove it if we feel it should be a blocker for the PR itself.
Running a very dumb benchmark on .NET 9 preview, generating some work so that it is not optimized out:
| Method | Mean | Error | StdDev |
|---|---|---|---|
| GetDataFormat | 60.35 ms | 0.176 ms | 0.156 ms |
| GetDataFormat_Int | 17.95 ms | 0.114 ms | 0.107 ms |
The same, with the PR improvements (without static constructor) - PresentationCore swapped:
| Method | Mean | Error | StdDev |
|---|---|---|---|
| GetDataFormat | 20.10 ms | 0.079 ms | 0.070 ms |
| GetDataFormat_Int | 15.34 ms | 0.124 ms | 0.110 ms |
The same, with the PR improvements (including static constructor) - PresentationCore swapped:
| Method | Mean | Error | StdDev | Median |
|---|---|---|---|---|
| GetDataFormat | 13.86 ms | 0.080 ms | 0.066 ms | 13.875 ms |
| GetDataFormat_Int | 10.58 ms | 0.536 ms | 1.580 ms | 9.542 ms |
Do note that this targets the case-insensitive compare hence the big difference between string and int compare. The regression between CS -> CI and going straight for CI compare only is about 1ms for 1mil passes.
public partial class BenchmarkDemo
{
public StringBuilder sb = new StringBuilder(10_000_000);
[Benchmark]
public StringBuilder GetDataFormat()
{
sb.Clear();
for (int i = 0; i < 1_000_000; i++)
{
DataFormat format = System.Windows.DataFormats.GetDataFormat("bitmap");
sb.Append(format.Id);
}
return sb;
}
[Benchmark]
public StringBuilder GetDataFormat_Int()
{
sb.Clear();
for (int i = 0; i < 1_000_000; i++)
{
DataFormat format = System.Windows.DataFormats.GetDataFormat(2);
sb.Append(format.Id);
}
return sb;
}
}
If the change doesn't involve a behavioral change, that's a particularly good thing.
@miloush @lindexi As per your very good points, I've changed the implementation to behave the same way as it did before with the last commit while keeping the benefits of all the changes. The internal list/class won't be initialized unless it is actually meant to be used (with one of the Get*) methods.
The call methods are likely to get inlined by the JIT and with tiered compilation this can all go smooth like butter.
Also the lock now belongs to the internal class.
I'm personally not a big fan of moving code to static constructor just for performance, especially when the static constructor runs arbitrary code (Like UnsafeNativeMethods.RegisterClipboardFormat that could have side effects depending on when a static constructor ran).
You can probably improve the performance of the method EnsurePredefined by also checking if _formatList is null before locking (It would be: If _formatList is null -> Lock -> If _formatList is still null -> Initialize). This is a very common pattern and a cheap way to avoid locking for no reason.
Other than that, the other changes LGTM.
I'm actually testing waters to be entirely honest. As for code style, I'm happy to adapt if we get proper guidelines.
I am not saying these cannot or should not change, but indeed guidelines should be in place.
I have to admit I am a bit puzzled by the original code, I am having difficult time imagining a scenario where case sensitive search through data formats would provide a performance improvement worth the complexity but not a hardcoded switch. The only situation I can think of this needs to be done quickly is if it is executed on every mouse move during dragging.
I will also note that in general, this change would cause different format to match if they differ by case only, however RegisterClipboardFormat is documented as case-insensitive, so this seems like a fair change.
Earlier in the days the CI search essentially meant two different compares where CI was way more expensive as you couldn't just load a register and OR the two masks. However I think the main intention was to optimize for the internal library use where the formats are pre-defined. Now I find this irrelevant and as you've perfectly noted, formats are CI on their own.
As per the static constructor, I understand that the PInvoke call might be of concern (was originally mine as well) however we do not depend on any state of the call there, either it registers or it doesn't. Even though 'popular', I've always found the double checking a bit of an anti-pattern, where static constructor is clear and expressive. It also removes _formatList from being readonly, which means that currently the null check itself can never be optimized out unless I've missed something.