wpf icon indicating copy to clipboard operation
wpf copied to clipboard

[DataFormats.cs] Replace ArrayList with List<DataFormat> and refactor the class

Open h3xds1nz opened this issue 1 year ago • 8 comments

Description

DataFormats is one of those classes that still use ArrayList with merely a single data type — DataFormat. Is a small static class so while at it I refactored the whole class to become more readable and future-friendly if there's such a thing with WPF.

There are no functional changes besides removing the Ordinal -> OrdinalIgnoreCase compare and just using case-insensitive one by default as in .NET 8 this path is vectorized and I believe decreasing the code-size outweights the now immeasurable benefit of two subsequent compares (not to mention, should such a case occur, this solution will come on top) as this isn't a hot path regularly.

While refactoring, I've also removed obsolete CAS remarks and fixed some grammar mistakes.

As a last step, I removed the unnecessary double lock on InternalGetDataFormat/GetDataFormat methods by creating an internal static class, running the initialization of predefined formats in the static constructor, saving all the unnecessary overhead.

Customer Impact

This will increase performance by about 1/5 and additional (up to 4/5) increase can be observed should you query data formats with wrong casing input (which shall not be the case of WPF itself but if used externally, it might).

Regression

Almost none, as noted, the overall performance has been greatly increased and the code-size has decreased. The only "regression" is removing the case-sensitive compare first but overall it is still way faster.

Risk

I believe there are none as the static collection is used internally as a storage and access is done via defined GetDataFormat overloads.

Microsoft Reviewers: Open in CodeFlow

h3xds1nz avatar Jun 04 '24 15:06 h3xds1nz

@singhashish-wpf there is a lot of code style changes departing from the current codebase. Perhaps we should have some rules about what is expected going forward.

@h3xds1nz Can you provide any numbers demonstrating the performance/size impact? How did you test the changes?

miloush avatar Jun 04 '24 16:06 miloush

The last change with static constructor would penalize anyone just using the static properties which I would think are the most common usage of this class.

miloush avatar Jun 04 '24 16:06 miloush

@miloush I believe that WPF should move forward in the future while not introducing any breaking-changes. I'm actually testing waters to be entirely honest. As for code style, I'm happy to adapt if we get proper guidelines.

As for the static constructor, I do agree with you it will "penalize", but I view it as very cheap. It is a point for discussion though, I do not feel strongly about it, I'm happy to remove it if we feel it should be a blocker for the PR itself.

Running a very dumb benchmark on .NET 9 preview, generating some work so that it is not optimized out:

Method Mean Error StdDev
GetDataFormat 60.35 ms 0.176 ms 0.156 ms
GetDataFormat_Int 17.95 ms 0.114 ms 0.107 ms

The same, with the PR improvements (without static constructor) - PresentationCore swapped:

Method Mean Error StdDev
GetDataFormat 20.10 ms 0.079 ms 0.070 ms
GetDataFormat_Int 15.34 ms 0.124 ms 0.110 ms

The same, with the PR improvements (including static constructor) - PresentationCore swapped:

Method Mean Error StdDev Median
GetDataFormat 13.86 ms 0.080 ms 0.066 ms 13.875 ms
GetDataFormat_Int 10.58 ms 0.536 ms 1.580 ms 9.542 ms

Do note that this targets the case-insensitive compare hence the big difference between string and int compare. The regression between CS -> CI and going straight for CI compare only is about 1ms for 1mil passes.


   public partial class BenchmarkDemo
   {
       public StringBuilder sb = new StringBuilder(10_000_000);

       [Benchmark]
       public StringBuilder GetDataFormat()
       {
           sb.Clear();
           for (int i = 0; i < 1_000_000; i++)
           {
               DataFormat format = System.Windows.DataFormats.GetDataFormat("bitmap");
               sb.Append(format.Id);
           }

           return sb;
       }

       [Benchmark]
       public StringBuilder GetDataFormat_Int()
       {
           sb.Clear();
           for (int i = 0; i < 1_000_000; i++)
           {
               DataFormat format = System.Windows.DataFormats.GetDataFormat(2);
               sb.Append(format.Id);
           }

           return sb;
       }
   }

h3xds1nz avatar Jun 04 '24 16:06 h3xds1nz

If the change doesn't involve a behavioral change, that's a particularly good thing.

lindexi avatar Jun 05 '24 01:06 lindexi

@miloush @lindexi As per your very good points, I've changed the implementation to behave the same way as it did before with the last commit while keeping the benefits of all the changes. The internal list/class won't be initialized unless it is actually meant to be used (with one of the Get*) methods.

The call methods are likely to get inlined by the JIT and with tiered compilation this can all go smooth like butter.

Also the lock now belongs to the internal class.

h3xds1nz avatar Jun 05 '24 20:06 h3xds1nz

I'm personally not a big fan of moving code to static constructor just for performance, especially when the static constructor runs arbitrary code (Like UnsafeNativeMethods.RegisterClipboardFormat that could have side effects depending on when a static constructor ran).

You can probably improve the performance of the method EnsurePredefined by also checking if _formatList is null before locking (It would be: If _formatList is null -> Lock -> If _formatList is still null -> Initialize). This is a very common pattern and a cheap way to avoid locking for no reason.

Other than that, the other changes LGTM.

ThomasGoulet73 avatar Jun 06 '24 02:06 ThomasGoulet73

I'm actually testing waters to be entirely honest. As for code style, I'm happy to adapt if we get proper guidelines.

I am not saying these cannot or should not change, but indeed guidelines should be in place.

I have to admit I am a bit puzzled by the original code, I am having difficult time imagining a scenario where case sensitive search through data formats would provide a performance improvement worth the complexity but not a hardcoded switch. The only situation I can think of this needs to be done quickly is if it is executed on every mouse move during dragging.

I will also note that in general, this change would cause different format to match if they differ by case only, however RegisterClipboardFormat is documented as case-insensitive, so this seems like a fair change.

miloush avatar Jun 06 '24 09:06 miloush

Earlier in the days the CI search essentially meant two different compares where CI was way more expensive as you couldn't just load a register and OR the two masks. However I think the main intention was to optimize for the internal library use where the formats are pre-defined. Now I find this irrelevant and as you've perfectly noted, formats are CI on their own.

As per the static constructor, I understand that the PInvoke call might be of concern (was originally mine as well) however we do not depend on any state of the call there, either it registers or it doesn't. Even though 'popular', I've always found the double checking a bit of an anti-pattern, where static constructor is clear and expressive. It also removes _formatList from being readonly, which means that currently the null check itself can never be optimized out unless I've missed something.

h3xds1nz avatar Jun 06 '24 09:06 h3xds1nz