Slow to get RGB
I am creating a color picker with Unicolour and WriteableBitmap in WPF. Here is what I want to implement:
When I creating the left plane image, I found that it is too slow to get the RGB colors.
int width = writeableBitmap.PixelWidth, height = writeableBitmap.PixelHeight;
unsafe {
byte* pixels = (byte*)writeableBitmap.BackBuffer.ToPointer();
writeableBitmap.Lock();
Stopwatch sw = new();
for (int row = 0; row < height; row++) {
for (int col = 0; col < width; col++) {
int i = row * writeableBitmap.BackBufferStride + col * 3;
// get the values of the color...
Unicolour color = new(colorSpace, tuple);
sw.Start();
Rgb255 rgb = color.Rgb.Byte255;
sw.Stop();
pixels[i] = (byte)rgb.R;
pixels[i + 1] = (byte)rgb.G;
pixels[i + 2] = (byte)rgb.B;
}
}
Debug.WriteLine(sw.Elapsed.TotalMilliseconds);
writeableBitmap.AddDirtyRect(new Int32Rect(0, 0, width, height));
writeableBitmap.Unlock();
}
I used a 128Ã128 image and the stopwatch showed that it took about 1143 milliseconds to convert the colors to RGB. This makes it a great deal for users to experience a lot of stuttering when using the color picker. Is there a way to fix this?
If the current color space is RGB/RGB255, the time will reduce to about 500 milliseconds, but it is still slow.
The right slider image is 1Ã128, and it doesn't feel stuttering.
Hmm, although I make a point in the readme of saying that "performance is not a priority", the performance you're describing is much worse than I'd expect and doesn't make sense to me.
In particular, it sounds like you're suggesting that it takes 500 ms to get Rgb values after constructing a Unicolour with Rgb values, which is bizarre - at the most there are 6 simple calculations (mapping R, G, B to and from the 255-range).
Are you able to isolate the performance issue any further? For example, do you see the same slowness when not using WPF, or when using a different runtime? Release mode instead of debug mode? The only other thing I can think of is that, if the operations are running on the UI thread, the UI thread might be using some of those 500ms to render the UI, handle user input, etc?
For my own curiosity and reassurance I've added some benchmarking code to a separate branch using BenchmarkDotNet - maybe you could reproduce some conversions you're doing based on this example and see if it's still so slow outside of your WPF context?
It's not a controlled experiment but on my laptop I ran benchmarks for converting from Rgb to every other colour space on both .NET 8.0 and .NET Framework 4.7.2 environments and I don't see anything surprising:
- It is quick to return
Rgb(1.5 - 4.5 Ξs) because there is no transformation to do fromRgb - It is slower to return spaces that require multiple complex transformations, such as
Cam02(6.7 - 14.4 Ξs),Cam16(5.3 - 12.5 Ξs),Hct(5.6 - 13.4 Ξs) - It is very slow to calculate
Wxy(108.2 - 135.4 Ξs) because that involves an intensive search algorithm - .NET Framework 4.7.2 is slower than .NET 8.0
Note that these metrics include the Unicolour construction, not just retrieving the converted value as your stopwatch example used.
In your particular case, for a 128 x 128 image, on my machine using the slower .NET Framework, I would expect the 16,384 conversions to take ~74,000 Ξs, or 74 ms - a lot less concerning that 500 ms!
For reference, here are the full results on my machine
BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4169/23H2/2023Update/SunValley3)
13th Gen Intel Core i7-13700H, 1 CPU, 20 logical and 14 physical cores
[Host] : .NET Framework 4.8.1 (4.8.9261.0), X64 RyuJIT VectorSize=256
.NET 8.0 : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2
.NET Framework 4.7.2 : .NET Framework 4.8.1 (4.8.9261.0), X64 RyuJIT VectorSize=256
| Method | Job | Runtime | TargetColourSpace | Mean | Error | StdDev |
|---|---|---|---|---|---|---|
| Convert | .NET 8.0 | .NET 8.0 | Rgb | 1.503 Ξs | 0.0145 Ξs | 0.0129 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Rgb | 4.480 Ξs | 0.0363 Ξs | 0.0339 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Rgb255 | 1.581 Ξs | 0.0099 Ξs | 0.0088 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Rgb255 | 4.565 Ξs | 0.0459 Ξs | 0.0430 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | RgbLinear | 1.967 Ξs | 0.0242 Ξs | 0.0202 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | RgbLinear | 5.166 Ξs | 0.0736 Ξs | 0.0615 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Hsb | 1.687 Ξs | 0.0218 Ξs | 0.0182 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Hsb | 4.980 Ξs | 0.0470 Ξs | 0.0367 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Hsl | 1.772 Ξs | 0.0219 Ξs | 0.0171 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Hsl | 5.166 Ξs | 0.0390 Ξs | 0.0365 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Hwb | 1.771 Ξs | 0.0139 Ξs | 0.0130 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Hwb | 5.103 Ξs | 0.0481 Ξs | 0.0450 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Hsi | 1.677 Ξs | 0.0223 Ξs | 0.0209 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Hsi | 4.880 Ξs | 0.0483 Ξs | 0.0452 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Xyz | 2.192 Ξs | 0.0418 Ξs | 0.0481 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Xyz | 5.673 Ξs | 0.0672 Ξs | 0.0628 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Xyy | 2.324 Ξs | 0.0379 Ξs | 0.0355 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Xyy | 5.998 Ξs | 0.0350 Ξs | 0.0327 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Wxy | 108.223 Ξs | 1.5660 Ξs | 1.3077 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Wxy | 135.388 Ξs | 1.5441 Ξs | 1.3688 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Lab | 2.535 Ξs | 0.0486 Ξs | 0.0632 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Lab | 6.320 Ξs | 0.1140 Ξs | 0.1066 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Lchab | 2.762 Ξs | 0.0305 Ξs | 0.0270 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Lchab | 6.745 Ξs | 0.0749 Ξs | 0.0701 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Luv | 2.384 Ξs | 0.0211 Ξs | 0.0176 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Luv | 6.143 Ξs | 0.0943 Ξs | 0.0836 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Lchuv | 2.698 Ξs | 0.0487 Ξs | 0.0432 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Lchuv | 6.523 Ξs | 0.0771 Ξs | 0.0721 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Hsluv | 3.395 Ξs | 0.0559 Ξs | 0.0523 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Hsluv | 8.472 Ξs | 0.0347 Ξs | 0.0290 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Hpluv | 3.530 Ξs | 0.0421 Ξs | 0.0373 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Hpluv | 8.027 Ξs | 0.0710 Ξs | 0.0629 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Ypbpr | 1.628 Ξs | 0.0207 Ξs | 0.0194 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Ypbpr | 4.661 Ξs | 0.0758 Ξs | 0.0709 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Ycbcr | 1.728 Ξs | 0.0149 Ξs | 0.0139 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Ycbcr | 4.866 Ξs | 0.0257 Ξs | 0.0240 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Ycgco | 1.637 Ξs | 0.0126 Ξs | 0.0118 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Ycgco | 4.711 Ξs | 0.0715 Ξs | 0.0669 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Yuv | 1.692 Ξs | 0.0333 Ξs | 0.0547 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Yuv | 4.654 Ξs | 0.0503 Ξs | 0.0470 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Yiq | 1.901 Ξs | 0.0195 Ξs | 0.0182 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Yiq | 5.177 Ξs | 0.0477 Ξs | 0.0446 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Ydbdr | 1.720 Ξs | 0.0186 Ξs | 0.0174 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Ydbdr | 5.020 Ξs | 0.0500 Ξs | 0.0468 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Tsl | 1.825 Ξs | 0.0351 Ξs | 0.0390 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Tsl | 4.814 Ξs | 0.0458 Ξs | 0.0429 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Xyb | 2.426 Ξs | 0.0314 Ξs | 0.0293 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Xyb | 6.344 Ξs | 0.0319 Ξs | 0.0299 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Ipt | 2.703 Ξs | 0.0401 Ξs | 0.0356 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Ipt | 6.815 Ξs | 0.0670 Ξs | 0.0594 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Ictcp | 3.093 Ξs | 0.0499 Ξs | 0.0466 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Ictcp | 7.059 Ξs | 0.0517 Ξs | 0.0483 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Jzazbz | 3.148 Ξs | 0.0266 Ξs | 0.0249 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Jzazbz | 7.448 Ξs | 0.0416 Ξs | 0.0368 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Jzczhz | 3.346 Ξs | 0.0265 Ξs | 0.0235 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Jzczhz | 8.020 Ξs | 0.1405 Ξs | 0.1314 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Oklab | 3.082 Ξs | 0.0290 Ξs | 0.0271 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Oklab | 7.177 Ξs | 0.0522 Ξs | 0.0488 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Oklch | 3.274 Ξs | 0.0278 Ξs | 0.0260 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Oklch | 7.764 Ξs | 0.1350 Ξs | 0.1263 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Okhsv | 6.717 Ξs | 0.1184 Ξs | 0.1108 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Okhsv | 13.080 Ξs | 0.1087 Ξs | 0.1016 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Okhsl | 5.525 Ξs | 0.1100 Ξs | 0.1578 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Okhsl | 10.661 Ξs | 0.0625 Ξs | 0.0584 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Okhwb | 6.670 Ξs | 0.0391 Ξs | 0.0366 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Okhwb | 13.323 Ξs | 0.1095 Ξs | 0.1024 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Cam02 | 6.682 Ξs | 0.0636 Ξs | 0.0531 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Cam02 | 14.399 Ξs | 0.0815 Ξs | 0.0762 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Cam16 | 5.275 Ξs | 0.0656 Ξs | 0.0613 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Cam16 | 12.523 Ξs | 0.1659 Ξs | 0.1552 Ξs |
| Convert | .NET 8.0 | .NET 8.0 | Hct | 5.627 Ξs | 0.0464 Ξs | 0.0362 Ξs |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | Hct | 13.380 Ξs | 0.2327 Ξs | 0.2177 Ξs |
I am using .NET Framework 4.8.
Now I use the following code directly in the Main() function in the WPF/WinForm Program.cs, and remove any UI code in it:
internal static class Program {
public static void Main() {
Stopwatch sw = new Stopwatch();
byte[] bytes = new byte[128 * 128 * 3];
for (int i = 0; i < 128; i++) {
for (int j = 0; j < 128; j++) {
int k = i * 128 * 3 + j * 3;
Unicolour color = new Unicolour(ColourSpace.Rgb255, 255, i / 128 * 255, j / 128 * 255);
sw.Start();
Rgb255 rgb = color.Rgb.Byte255;
sw.Stop();
bytes[k] = (byte)rgb.R;
bytes[k + 1] = (byte)rgb.G;
bytes[k + 2] = (byte)rgb.B;
}
}
Debug.WriteLine(sw.ElapsedMilliseconds);
}
}
As you can see, this will create the Unicolour object and convert it to RGB $$128 \times 128 = 16384$$ times.
The tests are as follows:
| Config | Time |
|---|---|
| Debug x64 | 289ms |
| Release x64 | 200ms |
Now I create a new .NET Framework 4.8 Console Application Project with same code (just replace Debug.WriteLine to Console.WriteLine). The tests are as follows:
| Config | Time |
|---|---|
| Debug Any CPU | 330ms |
| Release Any CPU | 313ms |
| Debug x64 | 213ms |
| Release x64 | 214ms |
It still seems to > 72ms.
Windows 11 (10.0.22631.4169)
12th Gen Intel Core i5-12600KF, 1 CPU, 16 logical and 10 physical cores
I've been investigating with some profiling tools and I can't find anything concerning around reading the Rgb255 values in this way, but I do see significant slowdown when I attach a debugger. My results aren't in the same range as yours but it consistently shows that:
- the code executes quickly when just "run"
- the code executes much more slowly when debugging instead of running
And this seems to be regardless of debug mode vs release mode.
Using .NET Framework 4.8, debug mode, Any CPU, reading 16,384 Rgb255 values using your exact code, on my laptop:
| Project Type | Execution Mode | Duration |
|---|---|---|
| WinForms | Run | 8 ms |
| WinForms | Debugging | 1,534 ms |
| WPF | Run | 11 ms |
| WPF | Debugging | 1,659 ms |
| Console | Run | 11 ms |
| Console | Debugging | 1,478 ms |
I'm hoping then that it's just a case of a debugger being attached but I'm not familiar with how much overhead a debugger actually comes with.
âđïļ However! âđïļ
While testing and profiling, I found that the Unicolour constructor was taking a fair bit longer relative to other functions, and I think there's an easy change that could halve the time it takes to construct. For my laptop that means ~0.002 ms per construction but it adds up.
For example, running (not debugging!) this code...
var stopwatch = Stopwatch.StartNew();
for (var i = 0; i < count; i++)
{
var colour = new Unicolour(ColourSpace.Rgb255, 255, i / 128.0 * 255, i / 128.0 * 255);
var rgb = colour.Rgb.Byte255;
}
Console.WriteLine(stopwatch.ElapsedMilliseconds);
... on my laptop results in:
| Count | Changed? | Duration |
|---|---|---|
| 16,384 | No | 146 ms |
| 16,384 | Yes | 103 ms |
| 1,000,000 | No | 4,188 ms |
| 1,000,000 | Yes | 1,731 ms |
Again, it's not a controlled experiment, but it's quite the difference.
In summary
- I can't reproduce any performance issue in the reading of
Rgb255, either with benchmarking or profiling, except when using a debugger - at which point everything slows down a lot (I'm using JetBrains Rider if that's of any interest) - There's definitely an opportunity to speed up the
Unicolourconstructor. I want to reiterate that my primary focus with Unicolour is correctness, not speed, but I'm happy to make performance improvements if they don't require huge changes! It'll take me a bit of time to make the update (I'm a little busier than usual at the moment!) though I don't think the change will make a big difference in your case
Thank you for your test, and look forward to your optimization and improvement of its performance.
At present, I am temporarily solving it by reducing the image size to 32Ã32, then stretching the image. This will cause some color gradient transition areas to produce anomalies, but at least it can speed up the process.
@otomad I'd be interested to know if running your WPF app without a debugger still has the same performance issues, because everything I've seen indicates it should be OK. I'm happy to take a look at the app itself if you're willing to invite me to a repo.
One extra thing that I'd forgotten to mention: the first time you construct a Unicolour will take quite a bit longer than the rest (perhaps ~50 ms). Behind the scenes it's caching a lot of data needed for some of the more complex calculations. As long as you continue to use the same Configuration instance (which you will if you don't override it), subsequent Unicolour constructions will be quicker. If you trigger the instantiation of Configuration.Default before you try to use your first Unicolour, you should also see some improvement.
For example:
var stopwatch = Stopwatch.StartNew();
for (var i = 0; i < 16384; i++)
{
var colour = new Unicolour(ColourSpace.Rgb255, 255, 255, 255);
var rgb = colour.Rgb.Byte255;
}
Console.WriteLine(stopwatch.ElapsedMilliseconds);
| Code | Duration |
|---|---|
| Current | 143 ms |
| With constructor change | 106 ms |
_ = Configuration.Default; // ð only needed once, makes first Unicolour initialise faster
var stopwatch = Stopwatch.StartNew();
for (var i = 0; i < 16384; i++)
{
var colour = new Unicolour(ColourSpace.Rgb255, 255, 255, 255);
var rgb = colour.Rgb.Byte255;
}
Console.WriteLine(stopwatch.ElapsedMilliseconds);
| Code | Duration |
|---|---|
| Current | 75 ms |
| With constructor change | 34 ms |
(This is because new Unicolour(ColourSpace.Rgb255, r, g, b) is equivalent to new Unicolour(Configuration.Default, ColourSpace.Rgb255, r, g, b)
Hopefully that'll help a bit too.
Constructor performance improvements have been merged in https://github.com/waacton/Unicolour/commit/dc70ed58f69c0910fab9a79f3591b204658ea1b7 and is available in version 4.7.0 ð
With this change the constructor won't unintentionally trigger any calculations. And for further improvements you can do the things I mentioned above:
- Initialise
Configurationearly, before creatingUnicolourobjects - Use run instead of debug when looking at performance.
As a follow up to this, I've reworked how a Unicolour is constructed in 6.0.0, and some naive benchmark suggests it's 95% faster.
Results for reference
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.3775)
13th Gen Intel Core i7-13700H, 1 CPU, 20 logical and 14 physical cores
[Host] : .NET Framework 4.8.1 (4.8.9300.0), X64 RyuJIT VectorSize=256
.NET 8.0 : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
.NET Framework 4.7.2 : .NET Framework 4.8.1 (4.8.9300.0), X64 RyuJIT VectorSize=256
| Method | Job | Runtime | Mean | Error | StdDev |
|---|---|---|---|---|---|
| Construct | .NET 8.0 | .NET 8.0 | 37.79 ns | 0.334 ns | 0.296 ns |
| Convert | .NET 8.0 | .NET 8.0 | 52.92 ns | 1.059 ns | 0.991 ns |
| Construct | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 45.27 ns | 0.656 ns | 0.614 ns |
| Convert | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 61.74 ns | 0.489 ns | 0.457 ns |