NNEF-Tools icon indicating copy to clipboard operation
NNEF-Tools copied to clipboard

Dynamic shapes

Open botev opened this issue 7 years ago • 44 comments

Reading the current spec it seems that NNEF supports only fixed (constant) shape sizes. Is that correctly understood or not and if so do you intend to support dynamic/variable shapes?

botev avatar Dec 21 '17 13:12 botev

NNEF supports dynamic shapes for inputs. See https://www.khronos.org/registry/NNEF/specs/1.0/nnef-1.0.html#external-op

rgiduthuri avatar Dec 21 '17 14:12 rgiduthuri

Yes, however, this is true only for "external". However, for parameters (e.g. variable) one can have variable shapes in order to produce a more "generic" graph expression, which then needs to be specified for the graph later. The verification of the correctness of shapes can be done by using polynomial integer expressions. I would guess this is not really on your radar atm though.

botev avatar Dec 21 '17 15:12 botev

Since the variables in NNEF come out of training, like weights, and needs to be stored in th container using Tensor File Format, they can’t be dynamic in NNEF 1.0.

Do you have a use case that requires dynamic shape support for variables coming out of any training frameworks?

rgiduthuri avatar Dec 21 '17 15:12 rgiduthuri

Not for post-training no. The only use case would be for actual compilers that optimize graphs and cache, as in this way changing the shape would not (most likely not) change the optimized code.

botev avatar Dec 21 '17 15:12 botev

Interesting idea. Such compilers could use “0” value in the shape to indicate dynamic shape of variables. Since such behavior falls outside the scope of NNEF 1.0, it could be a vendor extension for now.

rgiduthuri avatar Dec 21 '17 16:12 rgiduthuri

@botev just to clarify if I get it right: are you saying that the structure description could be generic, not exactly specifying the shape of variables, and the shape would get fixed when it is bound to some actual data for the weights? For example I could have the same general network structure, but in one data I have 3x3 convolutions, in the other I have 5x5, or the channel counts may be different. As @rgiduthuri says, this could be expressed syntactically (with 0s in the shape, just as in case of externals), but we'd have to see clearly its implications on semantics, such as shape propagation (verification of correctness as you mention). Can you give some more info (maybe links) on how polynomial integer expressions are used for this in practise?

gyenesvi avatar Dec 21 '17 16:12 gyenesvi

@gyenesvi Well, the main issue is - nobody uses polynomial integer expressions for now. I have argued a few times people for adopting it as it can make graph optimization significantly more generic and easy. This is something I think would be a nice feature. To clarify imagine you have the following network (pseudocode):

n = symbolic_int("n")
x = input(2*n, 784)
x_half = x[:(n // 2)]
h1 = dense(x, num_units="a")
h2 = dense(h1, num_units="b")
h3 = dense(h2, nonlinearity=softmax, num_units=10)
loss = (x - h3)**2 / 2.0
// Raises an error since first axis are different - n and n/2
loss_err = (x_half - h3) **2 / 2.0 

Now when you initialize/construct the model you will need to bind a and b (as they affect parameters). Also, note that if you use polynomials rather than just a generic 0/None you can detect faults such as the last line in the above (in polynomials n is not equal to n/2). This additionally provides shape verification at compile/construction time rather than at runtime (e.g. lot's of Theano/Tf errors would not even exist). An example how this could be used is here. Note that the package there is not maintained, however it uses polynomials for inferring shapes correctly at construction time. It is a bit like providing better static inference in compilers (e.g. templates a like).

botev avatar Dec 21 '17 17:12 botev

Okay, I understand what you mean now. I played around a bit with a similar idea, but thought it would be too complicated to implement (I did not think it through that you end up with polynomial expressions). It would be easy to express in NNEF syntax, since it can have compile-type symbols, something like the following along your example (this is not valid now, because graphs cannot have compile-time parameters (only fragments can), just for the sake of the example, but otherwise the syntax is legal):

graph net( input: tensor, n: extent, a: extent, b: extent )
{
    x = external(shape = [2 * n,  784])
    [y, z] = split(x, axis = 0, ratios = [1,1])
    h1 = linear_layer(x, channels = a)
    h2 = linear_layer(h1, channels = b)
    h3 = linear_layer(h2, channels = 10)
    loss = (x - h3) ^ 2.0 / 2.0
    loss_err = (y - h3) ^ 2.0 / 2.0
}

So theoretically, it would be possible to deduce from the syntax that the last line is not valid, and raise an error, however, it would require somewhat complicated evaluation and comparison of the shape expressions (essentially creating the polynomial expressions in some canonical form). And as you say, it would require a binding mechanism for the compile-time parameters n, a, b. So I'd say that this is definitely not a basic feature, but could be an interesting extension for the future if needed. As always, the question is whether its utility would outweigh the complexity required to implement it.

gyenesvi avatar Dec 21 '17 23:12 gyenesvi

Yes, this is exactly it. In terms of the complexity I have already implemented the integer polynomials module for this, which support operations like a floor, ceil (for convolutions when we have bigger strides) and max, min. The implementation is in Rust, however, it should not be too difficult to port that to C++ - here.

I do agree, however, that it is up to you guys to weight it in if that is something you want. Tbh I'm happy to help on this if you do think it is an interesting feature to add in the future. My personal like of it is that in very complicated models, having something to verify your shapes at construction time, rather than run time, is really amazing for research debugging. Similar benefits to having a static-typed variable compared to dynamically typed.

botev avatar Dec 21 '17 23:12 botev

Thanks, I'll let you know if we consider this in the future.

gyenesvi avatar Dec 22 '17 08:12 gyenesvi

@botev, some updates on this one. I believe, that from the point of view of NNEF, enough information is present to implement what you discuss above. Previously, I thought you need some symbolic integers to declare parametric shapes (like 'n' above), but in fact I have realized that in order to perform the checks, it does not matter how those variables are named. One only needs to be able to express that it is not a fixed value, and that is possible via setting some components of the shape to 0 in the external definition, which is allowed. So the below is valid, and declares that the batch size is unknown:

graph net( input ) -> ( output )
{
    x = external(shape = [0,  784])
    [y, z] = split(x, axis = 0, ratios = [1,1])
    h1 = linear_layer(x, channels = 20)
    h2 = linear_layer(h1, channels = 30)
    h3 = linear_layer(h2, channels = 10)
    loss = (x - h3) ^ 2.0 / 2.0
    loss_err = (y - h3) ^ 2.0 / 2.0
}

So the real question is whether the NNEF specification should mandate such checks that you mention. I'd say that no, it should not mandate them to allow for simpler parser implementations, however, since all information is present, an NNEF parser/graph compiler may further check whatever it wants to. So I'd say the above NNEF document would be declared valid by the spec, but parser/validator tools are free to implement further checking. For example a tool that realizes/proves that it will actually never result in a valid graph, no matter what you substitute for the batch dimension could be useful.

How does that sound to you?

gyenesvi avatar May 22 '18 14:05 gyenesvi

I think that is a valid point and it makes things a lot easier. However, note that the inverse problem (e.g. given all of the graph operations deciding what values of 0 are valid) is a lot harder then the forward one, where you specify any divisibility of the original shapes that it must satisfy and verify that works (E.g. verifying that 2*n is divisible by two is a lot easier than inferring that the minimum irreducible statement is that shapes divisible by 2 satisfy the constraints of the graph).

botev avatar May 22 '18 23:05 botev

Sure, I can imagine that the problem may become easier if you express the unknown (here batch) dimension with a formula that guides the validation. However, what if in the above example, instead of writing 2*n, the user only puts n. You'd still want to validate that, and the result would be the same, that the last line will always be invalid, no matter what you substitute to n. So I guess you need to prepare your algorithm for the general case anyway (no guidance, just a single symbolic integer for the extent in the shape).

(Actually, I believe that your original example above should contain n instead of 2*n, or the definition of x_half should be x[:n] instead of x[:(n // 2)], right?)

gyenesvi avatar May 23 '18 08:05 gyenesvi

You are right my example actually is quite bad at showing the issue I described. So let me remake the example to make this clear:

n = symbolic_int("n")
x = input(2*n, 784)
y = input(2*n, 10)
x_half = x[:n]
h1 = dense(x_half, num_units="a")
h2 = dense(h1, num_units="b")
h3 = dense(h2, nonlinearity=softmax, num_units=10)
// Easy to check inconsistent shapes since second dimensions are constant - 10 vs 784
loss_err = (x_half - h3) **2 / 2.0
// Hard error to check since the frist dimensions are inconsistent - n vs 2n
loss = (y - h3)**2 / 2.0

I think what you are refering is the frist error, which I think is easy to check as long as you have constant shapes. However, the ones like the second are the more general and difficult to catch. Hope that makes it a bit clear. I personally prefer to be able to solve both of them, not only the first. However, that does require the user to specify some constraints on the unknown shapes.

botev avatar May 23 '18 18:05 botev

I was actually referring to the second error (sure, the first is easy). My point was that the example is the same if you write the following (n instead of 2*n and n//2 instead of n). Are you saying that the following case would not be detectable because the user did not specify enough constraints on the unknown shapes?

n = symbolic_int("n")
x = input(n, 784)
y = input(n, 10)
x_half = x[:(n//2)]
h1 = dense(x_half, num_units="a")
h2 = dense(h1, num_units="b")
h3 = dense(h2, nonlinearity=softmax, num_units=10)
// Easy to check inconsistent shapes since second dimensions are constant - 10 vs 784
loss_err = (x_half - h3) **2 / 2.0
// Hard error to check since the frist dimensions are inconsistent - (n//2) vs n
loss = (y - h3)**2 / 2.0

I guess if I was the user, I would expect these two formulations to be equivalent, coming to the same conclusions about validity, which means that you have to prepare your implementation for cases where you do not have the luxury of constraints on the unknown shapes. And if you have to prepare for the hard cases, you don't win too much by having some cases simplified. On the other hand, not preparing for the hard cases feels like a half solution.

I think if your intention is exactly to catch errors made by the user, you cannot rely on the user providing you some hints to ease validation, that sounds kind of counter-intuitive to me. Or am I still missing something here?

gyenesvi avatar May 23 '18 19:05 gyenesvi

So if there is a symbolic expression of n//2 I guess you can assume that this must be a valid division and then directly "internally redefine n->2n, so maybe this should be ok. However, having this to be inferred my lead to hard cases where you have something like a+2n = c+d` and then it is very difficult to reason. As for allowing the user to specify it, I think given how much 3rd party libraries are being used the actual end users would potentially have an easier time doing this in my opinion, as usually these constraints are very natural and easy to specify when you have variable shapes, but can prevent you from making mistakes.

botev avatar May 23 '18 20:05 botev

So how would the a + 2n = c + d case be solved by the approach you mention? If I understand correctly, here a, n, c, d may come from different unknown shapes, and the equality is what you want to test. By proper hinting, these could actually become related, for example the explicit symbolic variables could state that in fact a and c are derived from the same symbolic integer and that d is actually equal to 2n. Is that what you are proposing?

To see this in a real example:

graph G( input1, input2 ) -> ( output )
{
    input1 = external(shape = [0,10]);
    input2 = external(shape = [0,10]);
    output = add(input1, input2);
}

Here you can't reason about the addition being valid, because the two input shapes could be unrelated, however, if we could write:

graph G( input1: tensor, input2: tensor, batch: extent ) -> ( output: tensor )
{
    input1 = external(shape = [batch,10]);
    input2 = external(shape = [batch,10]);
    output = add(input1, input2);
}

Then the user explicitly states that the two are the same (although unknown), so you can reason about it and perform more checks.

Is this the essential difference that you'd need to let you perform all checks that you'd like?

gyenesvi avatar May 24 '18 08:05 gyenesvi

Yes, that is indeed what I mean. And to I think that these constraints are quite reasonable to be specified as well as being very simple. E.g. things like both things are from the same batch, or m,y convolution kernel dimension is equal to that of the input etc... Libraries on top can as well use the shapes to create new tensor with matching, but unknown shapes. This is a bit like compile-time checks vs runtime checks in programming languages.

botev avatar May 24 '18 08:05 botev

Okay, finally we got the essence of it, I agree that such a description carries more information and is something that users or high level libraries could naturally express.

For now, the graph can only have tensor parameters (that's why it does not need to be marked as such), so this would require allowing non-tensor parameters as well. We'd have to see what that entails, but at least now it is clear in which direction the syntax would require extension.

gyenesvi avatar May 24 '18 09:05 gyenesvi

I have recently been experimenting with describing and validating output shapes in NNEF, and not surprisingly, arrived to integer polynomial expressions, so it might be interesting to understand this in more detail.

The first question we have to answer is how does this exactly help compilation. The reason I have been looking at this is validation of a parameterized description (such as a single, possibly compound operation, or alternatively a whole graph) for structural correctness. Also, you mentioned above that you know of some use cases for optimization, could you elaborate on that just to have some motivating example?

Second, in order to estimate the complexity of implementation, it would be good to clarify what properties/operations are imaginable/required on integer polynomials (for example in mathematical, implementation independent terms). From my experiments, I have gathered the following:

  • a basic polynomial may be a single identifier (a, b, ...) or a single integer constant (1, 2, ...)
  • we can add/subtract two polynomials (a + b = a+b, a - 2 == a-2)
  • we can multiply a polynomial with a constant (2 * a+1 == 2a + 2)
  • we can divide a polynomial with a constant, however, care must be taken for proper rounding (a + 1 / 2 is not necessarily equal to a/2 + 1/2 in case of integer division). How to define this properly? For example by a separate constant divisor?

For the examples I have encountered so far, multiplication of two polynomials were not required, so each monomial could be of degree 1. Can you give an example when multiplication of polynomials is required? Any further operations that you found necessary? For example, looking at symbolic polynomials library you linked, I don't quite get the meaning/utility of floor/ceil and min/max operations, if they would be required here. Could you explain these?

Also, maybe some implementation tricks if any. A canonical form of polynomials with maximal degree 1 can be easily implemented as a map from strings to signed integers. These maps can then be simply compared for equality. How much more complicated would it get if higher degree/polynomial multiplication/division is needed?

gyenesvi avatar Dec 18 '18 14:12 gyenesvi

It turns out that for the fully general case, we do need multiplication and division of polynomials, which leads us to multivariate rational polynomial expressions, i.e. the ratio of two multivariate (integer) polynomials. So in the end we would need to test equality of two such expressions.

Building those polynomials from arithmetic expressions seems not too complicated. Furthermore, a seemingly simple way to perform comparison is taking the difference of the two expressions, transforming them to common denominator form, and seeing if the nominator becomes zero (after terms cancelling out each other). This seems doable.

Any experience on this?

gyenesvi avatar Dec 19 '18 14:12 gyenesvi

Indeed as you have arrived at this the main purpose of it is validation of parameter (and graph) descriptors that they conform to correct shapes. In terms of optimization this can be used for a optimizing memory allocator as it will potentially allow you to have an explicit polynomial expression for the storage of all tensors in the computation and together with arithmetic scheduler can build explicit division of memory ahead of execution, even if that is dynamic (e.g. input shapes can change) the division of the memory can still be done. This is quite involved and most likely you don't need to be bothered with it.

So multiplication and division is usually are used during reshaping. I'm not sure that you need rational polynomial expression though, what would be the case where you have something that is not an integer? In terms of the division example that (a+1) / 2 is not equal to (a/ 2 + 1/2) my personal view is that none of this should be allowed. If some variable at some point requires the division (a+1)/2 it should be the user responsibility to define that in fact a = 2k + 1 hence ther result of the division is k + 1 where k is an integer. This essentially would mean just to specify explicitely any asumptions about the variables, which the user knows. Note that this is pretty much how debugging this is done by people - we know that certain variables have certain properties of their shape so we are allowed to perform a specific operation. As for the library I linked, the reason for the floor and ceil functions is that some standard operations actually contain such expression (e.g. for instance convolutions with size that is not correctly divisble by the filter size). This leads to a bit of a corner casing, but essentially avoids the need for doing any rational polynomial expression and remains just in the integer realm. In that library a polynomial is just an ordered list of monomials. Generally these structures are really tiny in terms of memory footprint (although I have not measured it). Similarly, for shapes that are polynomial under division, one would requre that the division gives actual integer polynomial, e.g. (a^2 + 2a + 1)/(a+1) = a + 1 while (a^2 + a + 1)/(a + 1) is not allowed (throws an error).

botev avatar Dec 20 '18 04:12 botev

Just to clarify: in a rational polynomial expression (or function) everything is still integer. It is not the same as a rational polynomial, where the coefficients are rational, but instead the ratio of two integer polynomials.

It is required exactly for things like stride in convolution, for example a simple formula for the output size of a convolution is:

output_size = (input_size + padding - filter_size + stride) / stride

which is the ratio of two integer polynomials. Here / means integer division. It is required to make the formulas work for all cases, not just when exact integer division is possible, because in most networks out there this flexibility is allowed.

Your approach to formulate divisibility assumptions seems an interesting approach, although I find it a bit unnatural on the user side, and it may lead to parsing difficulties as well, as it would require the declaration of a shape to be an expression, like a = 2k+1, where the value of k must be deduced, and that can become very complicated depending on what expressions are permitted (maybe it's okay if simple expressions such as a single variable with multiplicative and additive constants are allowed).

Divisibility assumptions may also be given in the form a % b == 0. Furthermore, an integer division can be rewritten using the following identity: (a + c) / b = a / b + (a % b + c) / b. Then, substituting the above assumption can be useful for proving equivalence of two formulas. For example, let's say we have a formula that describes the output size of a network as x / 4 where x is the input size, and we have the assumption that x % 4 == 0. Let's say we have a 'same' convolution with a stride of 4. Then the convolution formula says that the output size should be (x + 3) / 4 (the + 3 achieving the ceil operation usual in this case). Then using the divisibility assumption we have

(x + 3) / 4 == x / 4 + (x % 4 + 3) / 4 == x / 4 + 3 / 4 == x / 4

and so we can see that the convolution output shape is as we expected. That was my idea for approaching this.

gyenesvi avatar Dec 20 '18 14:12 gyenesvi

It is required to make the formulas work for all cases, not just when exact integer division is possible, because in most networks out there this flexibility is allowed.

This is the main reason why in that library there exists ceil and floor operations, which allow representing in "valid form" expression as the one you showed. Whether you decide to call the data structure of this a rational polynomial or a ceil expression I think is equivalent, it's just a way of keeping the correct information around.

like a = 2k+1, where the value of k must be deduced

Deducing integer polynomial constraints is (I think) generally NP-hard hence why I think for any real purposes this has to be something that the user provides. However, as a researcher and user of libraries, I'm very comfortable (and even prefer to) provide myself with all assumptions about the data that a model needs for correctly inferencing shapes (e.g. that some size needs to be of the form 3k+1). Rather than being unnatural I would say it is similar to what a strong-typed language provides compared to weak type - being explicit allows for much better static checks.

Divisibility assumptions may also be given in the form a % b == 0

I think this is equivalent to a = b * k where k is an integer, hence the two things are fundamentally the same. The benefit of keeping everything in polynomial form is that it allows all operations (divisions, multiplications, etc...) to be implemented only once for polynomials and there is nothing else. Of course, it is very subjective if this is "a better" choice, but I've implemented those libraries in about a week or two using this perspective, without any extra "deduction" engine.

E.g. using your example, the user says that the shape of the convolution is 4k, then the result of the convolution is ceil(4k + 3, 4)=k. More or less the main motivation for keeping the things in polynomials is that it makes implementation and understanding of the concepts easier (for me at least) since everything is just an operation on a polynomial.

botev avatar Jan 05 '19 10:01 botev

I agree that it's okay if the user provides the assumptions about the model, what I don't see clearly is the actual means to do so with your approach. Can you give me more concrete examples of how this would work out in practise? Here are the bits/cases I want to be able to describe. For each (primitive) operation, there is a formula that describes the output shape given the input shape and the attributes of the operation. Given that, let's suppose we have a higher level, either a compound operation that uses those primitives, or the actual top level graph that has some shape parameters, and we want to reason about the validity of shapes in that higher level. For that we need to represent intermediate shapes as polynomials of the parameters, and compare those polynomials for equality. In general, this seems to be okay, the problematic bit being when division is required and when there may be additional knowledge in the form of divisibility constraints. How to introduce/represent these in the system? All other operations are clear, because addition/multiplication is simply defined on polynomials, but division is not.

Let's take a concrete operation, like conv. Let's assume it's a strided 'same' convolution, with a parameter called stride. In my approach, the output size is then defined as

output_size = (input_size + stride - 1) / stride

How would that look in your approach, given that it must be defined for all input shapes, not just the cases when there is a divisibility constraint. Would it be something like

output_size = ceil(input_size / stride)

where / means real division? Or alternatively, with an expression ceil_div(input_size, stride)? Is that what you mean by a ceil operation in your approach?

The next bit is when I have a graph, and I want to define it's input size with some divisibility constraint, and I want to express the output size with a formula. Say, I have a single strided conv op in the graph, I substitute 2 for the value of stride, and I want to express that the input is divisible by 2. I can write that the input size is 2 * k, the output size is k. Now the system should be able to check that when I substitute 2 * k for the input size of conv, does the output size based on the formula of conv match my expectation of k. It should compare if the polynom k == ceil(2 * k / 2). Is that right?

How is the floor and ceil represented in your approach? Is it actually the ratio of two polynomials, with the additional info of knowing whether the division is floor or ceil? Or is the key here that in the end when you have to do equality comparisons, the division always cancels out, and you are left with comparing simple polynomials?

gyenesvi avatar Jan 06 '19 13:01 gyenesvi

where / means real division? Or alternatively, with an expression ceil_div(input_size, stride)? Is that what you mean by a ceil operation in your approach?

How is the floor and ceil represented in your approach? Is it actually the ratio of two polynomials, with the additional info of knowing whether the division is floor or ceil?

Indeed in the approach taken in the library, there are several "primitives" - a monomial, a constant and a ceil/floor operation. Essentially, you create a special "monomial" primitive that contains information about the quotient and dividend and whether its a floor or ceil (I think they are actually two different types).

It should compare if the polynom k == ceil(2 * k / 2). Is that right?

The system as it is even atm whenever you call ceil/floor with two polynomials, it always first checks if they are divisible, in which case returns the polynomial exact division and if not constructs the new primitive. E.g:

ceil(2 * k, 2) -> k
ceil(2 * k + 1, 2) -> ceil_primitive(ref 2*k + 1, ref 2)

This basically allows for such a comparison to be done correctly. I even think that it simplifies the expressions, e.g. ceil(6 * k + 3, 6) -> ceil_primitive(2 * k + 1, 2) (it can find common polynomial divisors, not only constants).

botev avatar Jan 06 '19 18:01 botev

As an example I've added to the README in https://github.com/Metadiff/symbolic_polynomials:

// (5b + 2)
let poly1 = 5 * b + 2;
// (5b + 2)^2
let poly11 = &poly1 * &poly1;
// floor((5b + 2)^2, 5b + 2) = 5b + 2
let poly12 = floor(&poly11, &poly1);
// ceil((5b + 2)^2, 5b + 2) = 5b + 2
let poly13 = ceil(&poly11, &poly1);
println!("(5b + 2)^2 = {}", poly11);
println!("floor((5b + 2)^2, 5b + 2) = {}", poly12);
println!("ceil((5b + 2)^2, 5b + 2) = {}", poly13);

You get:

(5b + 2)^2 = 25b^2 + 20b + 4
floor((5b + 2)^2, 5b + 2) = 5b + 2
ceil((5b + 2)^2, 5b + 2) = 5b + 2

There exist also max and min operation which are sometimes needed for some operations.

botev avatar Jan 06 '19 20:01 botev

Thanks, that clarifies some things about floor/ceil operations. However, I have some further bits that are not clear. If floor and ceil are separate things, how do you define operations on floor/ceil objects? Say, addition/multiplication of floor/ceil with a regular poly, or even addition/multiplication of a floor object with a ceil object, like floor(a,b) + ceil(c,d) or floor(a,b) * ceil(c,d)? I haven't thought this through, it may be possible to do, but I feel that there might be nontrivial details here.

What I like about a simple ratio of polynomials (essentially a floor(a,b) in you approach) is that every polynomial can be represented as a same ratio object (trivial ones with a denominator of 1), and so you have to define all operations on these ratio objects only, which is trivial. The ceil operation seems to complicate that scheme. Apart from the above questions of operations on floor/ceil, the ceil operation is redundant, because it can also be expressed as floor(a + b - 1, b), and it should be possible to detect the equivalence of the two. Can this be done?

Furthermore, ratio polynomials can be compared for equality without the need to do actual polynomial division and simplification, which would require algorithms, like polynomial gcd, that are quite complicated, especially in the case of multivariate polynomials. I understand that it is possible to implement it, but may be too much burden for an NNEF consumer.

To make this work in NNEF, it should be possible to clearly define how expressions can be converted to polynomials, how operations are defined on those polynomials, and prove that it is always possible to decide equality of two expressions via polynomials. With ratio polynomials, this seems to be true in my view. Have you done some analysis in this direction using the floor/ceil formulation?

gyenesvi avatar Jan 07 '19 10:01 gyenesvi

Thanks, that clarifies some things about floor/ceil operations. However, I have some further bits that are not clear. If floor and ceil are separate things, how do you define operations on floor/ceil objects? Say, addition/multiplication of floor/ceil with a regular poly, or even addition/multiplication of a floor object with a ceil object, like floor(a,b) + ceil(c,d) or floor(a,b) * ceil(c,d)? I haven't thought this through, it may be possible to do, but I feel that there might be nontrivial details here.

You can imagine that the Floor and Ceil entities are treated as separate irreducable variables. Mathmatically is like saying if we have variables a,b, then we say let c=floor(a,b) and now we have three totally seperate variables a,b,c. Implementation wise there is a tagged union type that looks like this (simplified):

pub enum Composite{
    Variable(I),
    Floor(polynomial, polynomial),
    Ceil(polynomial, polynomial),
}

Then a monomial is defined as

pub struct Monomial{
    pub coefficient: f32,
    pub powers: Vec<(Composite, int32)>,
}

And a polynomial is just a vector of monomials, representing their sum. Hence, you can build polynomials of these composite expressions in arbitrary form.

The ceil operation is redundant, because it can also be expressed as floor(a + b - 1, b), and it should be possible to detect the equivalence of the two. Can this be done?

That's a very good point which have not crossed my mind before. Hence, it is probably enough to define only one of these as a Composite and the other being constructed in the manner you described.

Furthermore, ratio polynomials can be compared for equality without the need to do actual polynomial division and simplification, which would require algorithms, like polynomial gcd, that are quite complicated, especially in the case of multivariate polynomials.

This is a good point, however you will need to represent standard polynomials somehow separately from irreducable ratios (similar to how the floor is separate), since b * floor(a, b) is not a and hence if floor(a,b) is represented just as a ratio you can not multiply it without precautions. Could you elaborate on how this would be done when we represent things as pure ratios?

botev avatar Jan 08 '19 14:01 botev

Okay, now I understand that you treat those sub-expression as irreducibles, and that means that you cannot further simplify the expression. I agree that it is probably necessary to have such an irreducible, so you need to have two kinds of ratios, one reducible (a / b that can participate in multiplication, because we know that a % b == 0) and one irreducible (floor(a,b)). So pure ratios alone are not enough, that's true.

However, I still want to establish some relation between the two, because it is needed in some cases to prove equalities. For example, imagine two operations, one downsamples the input by a factor, and the other upsamples it. Supposing the input size is x, output size of the downsampling is defined as floor(x,factor), and the output size of upsampling is x * factor. Suppose we apply the two after each other (downsample, then upsample), and we want to create a compound operation from this sequence. However, we want the compound to have identical output shape as the input shape (x). For this, the definition of the compound provides the assumption that x % factor == 0.

When checking the validity of that compound definition, combining the formulas of the two ops, we get floor(x,factor) * factor, and we want to prove that given x % factor == 0, this actually equals (x / factor) * factor, which can be reduced to x.

We could use the identity floor(a,b) == a / b - (a % b) / b to rewrite the floor primitive, maybe having a % b as an irreducible primitive, and then the divisibility assumption in the form a % b == 0 could easily be used during the reduction, simply by substituting the value of such an irreducible, resulting in floor(a,b) simplified to a / b.

Or alternatively, as you proposed, we can state in our compound operation definition that the input and output size is in the form x = factor * y, and then floor(factor * y, factor) * factor can be simplified to y * factor.

I find the first approach (a % b as an irreducible) slightly simpler, because you only need to do substitution, whereas understanding when an expression floor(expr1, expr2) can be reduced seems to be more involved algorithmically (you need to simplify polynomials by gcd).

gyenesvi avatar Jan 09 '19 15:01 gyenesvi