FixedPointNumbers.jl icon indicating copy to clipboard operation
FixedPointNumbers.jl copied to clipboard

Don't wrap round

Open dpsanders opened this issue 4 years ago • 38 comments

This seems like a gotcha (indeed it caught us out when manipulating images):

julia> x = N0f8(0.5)
0.502N0f8

julia> x + x
0.0N0f8

dpsanders avatar Sep 09 '20 17:09 dpsanders

#41. FWIW this happens in most image-processing frameworks, including Matlab (which uses saturating arithmetic) and Python (which uses overflow arithmetic like us). That said, I am tempted to blaze a new trail to make this better.

There was an epic discussion in #143. We seem close to re-addressing this shortly, with the following proposals:

  • No supported arithmetic, convert immediately to float
  • Overflow (wrap-around) arithmetic like it is now
  • Saturating arithmetic like Matlab
  • Checked arithmetic (you'll get an error from that operation)
  • Widening arithmetic like in #143 (aside from supporting signed arithmetic, the key element of that proposal was a 2-fold widening, but no further widening from that point forward---the types are marked as "fully widened" and henceforth use overflow arithmetic)

Each proposal has its strengths and weaknesses. I'd be curious to hear your thoughts. I know @kimikage is planning to solicit input on discourse, but since you posted it...

timholy avatar Sep 09 '20 18:09 timholy

Thanks. I guess we were expecting it to saturate, but I see that it's a much harder (or insoluble) problem than it initially seems.

One option would be to implement +(:saturate, x , y) or +(Saturate(), x, y) etc., but of course this is pretty cumbersome. (And I guess have plain + throw and error and inform you about these options.)

dpsanders avatar Sep 09 '20 18:09 dpsanders

It's not impossible, it's just not obvious it's a good idea. Stefan has posted on the evils of saturating artihmetic. (E.g., x + x - x != x for many x.)

timholy avatar Sep 09 '20 19:09 timholy

Also much of it is already implemented (by @kimikage), see https://github.com/JuliaMath/FixedPointNumbers.jl/blob/595f7a7c8dd27fb40c96519c4878af5948d7deb4/src/FixedPointNumbers.jl#L218-L287

timholy avatar Sep 09 '20 19:09 timholy

We should eventually decide on a final course of action on the overflow issue, but we haven't made any significant progress in the last six months at least. :sob:

Apart from the overflow issues, I was unhappy with the implementation of arithmetic (e.g. ::Normed * ::Normed). So I decided to support checked, wrapping and saturating arithmetic as part of the improvement of arithmetic implementations.

After I complete the implementation of checked arithmetic (cf. https://github.com/JuliaMath/FixedPointNumbers.jl/issues/41#issuecomment-674682747), I'm going to open a thread in discourse to decide which is the default arithmetic (on a per-method basis) in v0.9.0.

That decision in v0.9.0 will only be the first step and should not be the final goal. However, I believe it is useful to have real working codes in order to determine the goal. Benchmarking is extremely important because overflow solutions are closely related to speed and memory performance.

kimikage avatar Sep 10 '20 03:09 kimikage

Incidentally, I "personally" disagree with making saturating arithmetic the default. However, I think there is a rationale for using saturation arithmetic for Gray{N0f8} and RGB{N0f8}, not N0f8.

I'm also preparing to experiment with the three types of arithmetic for ColorVectorSpace. https://github.com/kimikage/CheckedColorArithmetic.jl

kimikage avatar Sep 10 '20 03:09 kimikage

@timholy, I would appreciate it if you could reply to https://github.com/JuliaMath/CheckedArithmetic.jl/issues/6 when you have a time.

kimikage avatar Sep 10 '20 03:09 kimikage

As you think about the options, let me make a pitch again for the type of solution in #143, since you haven't mentioned it as something you're actively considering and because I do think it's one of the more attractive options.

The naive motivation is the following: if UInt8 + UInt8 might overflow, why not just return a UInt64? Well, that's a lot bigger and eats up a lot more memory, a serious negative if you're processing large images. OK, so how about a UInt16? The problem there is that when you then go to add a UInt16 to anything else, do you return a UInt32? That quickly gets out of hand---it makes arithmetic essentially non-inferrable.

So the core idea in #143 is to split into two worlds: "representation types" and "computed types." "Representation types" are what might be created, say, by reading an image from a file---they are used for your typical "starting image." Computed types are produced as the result of any math operation (well, any operation that tries to stay in the world of Fixed* numbers). Continuing the UInt analogy, instead of UInt8 + UInt8 -> UInt16, we instead have UInt8 + UInt8 -> WUInt16, where W marks it as a computed type. The key proposal is that representation types widen to computed types, but that computed types are "terminal" and used wrap-around (overflow) arithmetic. What that means is that if you're playing around at the command line and add two images and then send the result to a viewer, you should get exactly what you expect; if we make the computed type signed, then the same is true of the difference between two images. That latter is especially important: in image processing, we look at the difference image all the time, and making that both easy and accurate seems like an important goal.

The biggest strength of the proposal is that naive operations "just work," which is true of only one of the other proposals, having all math operations immediately convert to float. Unlike the latter, it also keeps resulting values relatively compact. The weaknesses of the proposal are that it effectively doubles the number of types we have to support, and it could lull people into false complacency: my proposed strategy is great for interactive exploration where you compute the sum or difference of a pair of images, but if you sum 1000 UInt8-represented images you need to allocate the result using a safer type.

Why don't other languages/frameworks do it this way? Because they don't have the types---they are limited to machine numbers, and so they can't distinguish a terminally-widened WUInt16 from a representational UInt16. But we don't have such limitations.

On balance I think having things "just work" for interactive exploration, while not compromising performance and accuracy, is probably the most important goal. Hence my two favorite options (in the abstract, without any practical experience) are "all math immediately converts to float" and "limited doubling of size" (aka, #143). I'm less fond of the saturating, "simple" wrap-around, or checked arithmetic options, which are the main ones you have been promoting. Hence I wanted to remind you of these options.

timholy avatar Sep 10 '20 08:09 timholy

Unsure how much effort is needed to achieve this, from the user's side I like the idea of splitting it apart to representation types and computation types. If I get the idea correctly, the representation types are mostly exclusive to IO package developers and are transparent to normal users.

Haven't taken a time to experiment with the new checked-math functions developed by @kimikage, though.

johnnychen94 avatar Sep 10 '20 08:09 johnnychen94

Yes, the idea is that we want to be able to represent any "input" image without conversion; reinterpret is fine, I'm talking about the bit-level description here. This is particularly important if your image is a 1TB file that you import via memory-mapping: you must be able to represent the meaning of that file using its native bit layout. But once you're working with it in memory you're free to choose different types for the result of math operations.

timholy avatar Sep 10 '20 08:09 timholy

The fundamental problem is that what is a "representation type" and what is a "computed type" depend on the use case. For me, N0f8 is sometimes a representation type and sometimes a computed type.

If you think it's an exception, please consider the case of ::N0f8 * ::N0f8. This can cause a loss of accuracy, but not an overflow. If you convert N0f8 to a wider type, you lose the optimization opportunity described above.

In fact, there must be both cases where you want to widen and cases where you don't want to widen. The two requirements are completely antithetical and there is no universal solution. The "neutral" approach is to let the user specify the way. We can add widening_* methods, but users can still convert the types manually.

kimikage avatar Sep 10 '20 12:09 kimikage

Sure, but in cases where widening isn't necessary then you don't have to do it. Maybe instead of calling them "representation" and "computed" we could call them "plain" and "marked." "plain" types can widen, if the operation requires it to avoid overflow; "marked" types cannot.

timholy avatar Sep 10 '20 12:09 timholy

You seem to be addressing only a single operation. I'm illustrating a case like (a::N0f8 - b::N0f8) * c::N0f8. How do I clear the "marked"?

Edit: Of course, this is about regressions, so I'm assuming that a - b is working correctly (even if it's wrapped-around). (1 - b) * c is actually used in interpolation etc.

Edit2:

Edit: you can always use % N0f8 for interpolation, or better just write that as complement(b).

Yes, I know the complement. However, it doesn't seem to me that calling additional functions intentionally is much different than leaving it up to the user to write float(x) or N24f8(x). In some cases, just widening 8-bit types into 16-bit types doesn't "just work".

kimikage avatar Sep 10 '20 12:09 kimikage

You don't. The answer might be negative. But in any case, once marked, always marked.

Edit: you can always use % N0f8 for interpolation, or better just write that as complement(b).

timholy avatar Sep 10 '20 13:09 timholy

The other major problem is that the 16-bit types are too rich for the 8-bit types. There is little benefit in using FixedPoint if it is allowed to be extended to 16-bit types. I've demonstrated before that Float16 is also never too slow if we give up full compliance with IEEE 754. If you are still concerned about speed (that is already a contradiction when using the 16-bit type, though), then the JuliaImages community should think more seriously about supporting BFloat16.

Also, in image processing, there are 8 bits that are not used in RGB24. Even if you add 2 bits to each channel, there is still 2 bits left. Both NaN and Inf can be represented if they are not channel-wise. Of course, the type is overly application-specific, but I think the types you're trying to add are overly application-specific, too.

kimikage avatar Sep 10 '20 13:09 kimikage

I'm fine with a fast Float16; I basically consider #143 and #147 different approaches to the same thing. BFloat16s is pretty interesting too. I guess there we can do some manipulations in Float32 and then strip bits?

Not all images are RGB24. My "real" work is mostly grayscale.

timholy avatar Sep 10 '20 13:09 timholy

Note: we don't necessarily have to choose, after all we have 4 (way more, actually) untapped binary addition operators :stuck_out_tongue_closed_eyes:

timholy avatar Sep 10 '20 13:09 timholy

BTW, I suggested in #167 to use f as a marker, rather than introducing new marked types. In any case, though, we will need to add signed Normed. (I don't think signed Normed is application-specific.)

kimikage avatar Sep 10 '20 14:09 kimikage

I was considering to open a new issue, then I saw this one that seems related and fresh.

I'm working with N0f8 images, and I have a windowing function which happens to be N1f7. Ideally when I do the operation I was expecting the result to be N0f8 * N1f7 = N1f15. What we get instead is N8f8. I imagine the rationale is just to keep the same largest accuracy, what makes N0f8 * N0f8 return N0f8?

I'm myself keen on preserving accuracy, with the best speed. Expanding from 8 bits to 16 bits is as far as I know the natural result in the low level, and in my example above it becomes a simple multiplication, while ending in N8f8 actually should involve a shift-right, if I'm not mistaken.

What is the rationale for these conversions right now, and is there any plans to support a type that preserves accuracy?

nlw0 avatar Sep 11 '20 11:09 nlw0

The simplest rationale is that this is what Base does:

julia> x = 0x02
0x02

julia> x*x
0x04

Regarding your proposal to return N1f15, see the "getting out of hand" remark in https://github.com/JuliaMath/FixedPointNumbers.jl/issues/227#issuecomment-690066640. I'd say you should do float(x) * float(y) if you're keen to preserve accuracy.

timholy avatar Sep 11 '20 12:09 timholy

Another important point is that the conversion of Normed's f is generally lossy. Of course, the current promotion rules are not perfect, but they are not so bad either.

BTW, the Normed-->Normed conversions have room for improvement. (cf. #140) (Edit: As #140 shows, N2f14 is much better than N1f15. :smile:)

Edit: I am also prepared to support widemul if needed. However, it is something like widemul(::N0f8, ::N0f8)::N8f8. I don't want to implement widemul for types with different fs. :sweat_smile:

kimikage avatar Sep 11 '20 12:09 kimikage

I understand for most applications preserving a constant precision is the way to go, especially if you have long expressions and it would "get out of hand". This isn't that uncommon, by the way, it would essentially be a sort of arbitrary-precision numeric type.

The idea is not to go on forever retaining all the accuracy after every operation. It's just that my original types have limited accuracy, and I want to control when to lose it. In practice the operations would return the highest accuracy, and I have to control when to throw away bits. While this may not be practical when prototyping, it might be useful to code a processing pipeline that makes extensive use of 8-bit and 16-bit SIMD operations, for instance. My N1f7 example is from a real application, and right now if I want to code an optimized fixed-precision pipeline I'll have to make the low-level computations by myself.

I'm curious to experimenting with this accuracy-preserving type, although I'm not sure it's really interesting to many people. Is this something you imagine could ever be a part of this package, or is it better if I develop something separate?

nlw0 avatar Sep 11 '20 12:09 nlw0

I cannot determine if it could be part of this package without a more concrete example.

There are two things to note. Most users only use N0f8, some use N0f16, and the other types are rarely used. Also, 2^f - 1 is very awkward when considering the accuracy (not precision).

kimikage avatar Sep 11 '20 13:09 kimikage

This isn't that uncommon, by the way, it would essentially be a sort of arbitrary-precision numeric type.

which are painfully slow:

julia> @btime x*y setup=(x = rand(Int); y = rand(Int))
  1.239 ns (0 allocations: 0 bytes)
-5710601054559440257

julia> @btime x*y setup=(x = big(rand(Int)); y = big(rand(Int)))
  59.760 ns (3 allocations: 48 bytes)
-28182581540266793182579202048252613136

We need to use hardware-supported arithmetic wherever possible.

I have to control when to throw away bits

Yep. If you need a workaround now...often you can do that by manually converting one value at the outset to the final precision you want to achieve. That's why I recommended float, but you could also do this:

julia> x = rand(N0f8)
0.514N0f8

julia> w = rand(N1f7)
0.339N1f7

julia> N1f15(x)*w
0.17392N1f15

and the other types are rarely used

I agree with that overall. In the past my lab used N2f14 a lot because we had a 14-bit scientific camera, but all of our cameras in current use are 16-bit. That said, there are some attractive cameras that are 10-bit, so some of those other types could come back into usage.

timholy avatar Sep 11 '20 14:09 timholy

It would be good to have some specific and practical model cases or benchmark tests. In my use cases, I rarely run into "unintended" overflow problems. I may be in a too low place to realize that I am falling into a pit. :sweat_smile:

We need to consider a little more about the workflow of the development of the program than the code itself. I think the workflow will also change depending on the nature of the images and the purpose (which is more important, the results or the methods?).

kimikage avatar Sep 16 '20 03:09 kimikage

reduce(+, A)/length(A) , mapreduce(abs2, +, A) and almost all the distances in Distances.jl if you try to implement it from scratch 😂

johnnychen94 avatar Sep 16 '20 03:09 johnnychen94

Is this really a practical case for the first two? I think it's clear that they don't work well "as is" in many cases, even if we automatically widen the 8-bit type to the 16-bit type. (Of course, as you know, there is a difference between reduce and sum. https://docs.julialang.org/en/v1/base/collections/#Base.sum)

I am not saying that they are trivial. I believe that the solutions will depend on whether they are practical or not. One of the reasons I mentioned workflows is to separate cases of coding errors (requiring code fixes outside of FixedPointNumbers) from the rest. Since Julia's static analysis tools are still underdeveloped, an effective countermeasure against coding errors is the checked arithmetic. If the one-time widening arithmetic is the default, the cases of coding errors are more aggravated. :scream: This is why I don't like the "implicit" widening arithmetic.

kimikage avatar Sep 16 '20 04:09 kimikage

julia> using TestImages, ImageView, ImageCore

julia> img = testimage("cameraman");

julia> scaling = LinRange(0.9, 1.1, size(img, 1));   # let's simulate a change in lighting

julia> imgs = Gray{N0f8}.(clamp.(scaling .* img, 0, 1));   # in the real world this would be a second image, so it would also have Gray{N0f8} eltype

julia> imshow(img - imgs)
Dict{String, Any} with 4 entries:
  "gui"         => Dict{String, Any}("window"=>GtkWindowLeaf(name="", parent, width-request=-1, height-request=-1, visible=TRUE, sensitive=TRUE, app-paintable=FALSE, can-focus=FALSE, has-focus=FALSE, is-…
  "roi"         => Dict{String, Any}("redraw"=>122: "map(clim-mapped image, input-32)" = nothing Nothing , "zoomregion"=>86: "input-31" = ZoomRegion{RInt64}(XY(1..512, 1..512), XY(1..512, 1..512)) ZoomRe…
  "annotations" => 88: "input-32" = Dict{UInt64, Any}() Dict{UInt64, Any} 
  "clim"        => 87: "CLim" = CLim{N0f8}(0.0, 1.0) CLim{N0f8} 

julia> imshow(float.(img) .- float.(imgs))
Dict{String, Any} with 4 entries:
  "gui"         => Dict{String, Any}("window"=>GtkWindowLeaf(name="", parent, width-request=-1, height-request=-1, visible=TRUE, sensitive=TRUE, app-paintable=FALSE, can-focus=FALSE, has-focus=FALSE, is-…
  "roi"         => Dict{String, Any}("redraw"=>130: "map(clim-mapped image, input-44)" = nothing Nothing , "zoomregion"=>94: "input-43" = ZoomRegion{RInt64}(XY(1..512, 1..512), XY(1..512, 1..512)) ZoomRe…
  "annotations" => 96: "input-44" = Dict{UInt64, Any}() Dict{UInt64, Any} 
  "clim"        => 95: "CLim" = CLim{Float32}(-0.0901961, 0.0745098) CLim{Float32} 

First image: image

Second image: image

The first is completely useless.

This is a day-in, day-out operation in scientific image processing, where you often want to know what changed over time or how your image registration algorithm changed the alignment of two images.

timholy avatar Sep 16 '20 08:09 timholy

I'm skeptical that overflow still can be a practical issue, even though that is a day-in, day-out operation. :confused: My daily task is as follows, though. :sob:

(@v1.5) pkg> git checkout master
ERROR: Could not determine command

I think it's a good benchmark, but I think the latter is scientifically "bogus". We should take account of the extra color mapping. It must have more costs than the subtraction. (cf. https://github.com/JuliaMath/FixedPointNumbers.jl/pull/143#issuecomment-559960221)

kimikage avatar Sep 16 '20 09:09 kimikage

I don't understand your meaning in your previous post, can you clarify? (Neither the git command example nor the extra color mapping. Sorry!)

timholy avatar Sep 16 '20 11:09 timholy