ManifoldDiffEq.jl icon indicating copy to clipboard operation
ManifoldDiffEq.jl copied to clipboard

Compute the number of dimensions N in `build_solution`

Open olivierverdier opened this issue 1 year ago • 12 comments

My code got broken with (probably) a recent version of SciML. Possibly related to commit https://github.com/SciML/SciMLBase.jl/commit/55d81af15d274d094a4c50f3e6ae82e6321d5d13 This is my quickfix to the problem, devoid of testing or understanding.

olivierverdier avatar Jun 30 '24 15:06 olivierverdier

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.13%. Comparing base (4f283a0) to head (87d9053).

:exclamation: Current head 87d9053 differs from pull request most recent head e442535

Please upload reports for the commit e442535 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #28      +/-   ##
==========================================
+ Coverage   95.12%   95.13%   +0.01%     
==========================================
  Files           8        8              
  Lines         328      329       +1     
==========================================
+ Hits          312      313       +1     
  Misses         16       16              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 30 '24 15:06 codecov[bot]

Hm indeed SciML does (for me) a bit too many breaking changes in non-breaking releases. Should we maybe also bump the compat entry then? (raise the version on the [compat]) If that is possible that is (i.e. we have that entry).

kellertuer avatar Jun 30 '24 20:06 kellertuer

Hm, I will try to fix this soon. Apart from SciML sometimes doing breaking changes in non-breaking releases, ManifoldDiffEq.jl hooks into some of their internals so they can justifiably feel free to change them as they see fit.

mateuszbaran avatar Jul 01 '24 07:07 mateuszbaran

Hm indeed SciML does (for me) a bit too many breaking changes in non-breaking releases.

This dimension was added in 2016. What's the right time interval for such changes?

ChrisRackauckas avatar Jul 01 '24 08:07 ChrisRackauckas

Apart from SciML sometimes doing breaking changes in non-breaking release

What breaking change? Please help me understand what you are asking for here and I can help out. But this dimension parameter has not changed in 8 years so I do not understand what's being asked for.

ChrisRackauckas avatar Jul 01 '24 08:07 ChrisRackauckas

Hm indeed SciML does (for me) a bit too many breaking changes in non-breaking releases.

This dimension was added in 2016. What's the right time interval for such changes?

I am not sure, I did not write this code, for now my main experience and interaction with SciML was sudden failures on CI – maybe because we used internals, I am not sure; so this looked like the same area, and I am not sure how the got to the parameter being 1, maybe because we adapted some example code? I personally did not get much into that, since I was usually missing documentation every now and then.

kellertuer avatar Jul 01 '24 08:07 kellertuer

Apart from SciML sometimes doing breaking changes in non-breaking release

What breaking change? Please help me understand what you are asking for here and I can help out. But this dimension parameter has not changed in 8 years so I do not understand what's being asked for.

No the parameter did not change, but how you handle it and how it is documented. I did not work with SciML for the last year or so, but when I did, e.g. implementing this function

https://github.com/JuliaManifolds/ManifoldDiffEq.jl/blob/4f283a0361488d57c36ba4498237934f3dde7cd8/src/frozen_solvers.jl#L30-L46

I have no clue (today) what any of the parameter is, I just copied that from an example. back then I could maybe figure out 20% of what the parameters mean because none of them are documented anywhere (and I am not a differential equations guy much). So when we do such guesswork, we maybe also guessed, the parameter 1 is the right thing to do from an example (where that was the case). Now in 2.40, you surely did not change the parameter itself, but how you interpret and work with it. Before you maybe were more relaxed about what the value was? Now you are not, and since we guessed what it might mean, we were wrong but never noticed, and then... what this ends up doing is that the commit linked above breaks other peoples code but is marked as nonbreaking. Usually this hits us often (I remember up to 5 times for now?) which I feel is annoying. But maybe the main issue is documentation (and not the breaking part which one could also consider bug fixing).

kellertuer avatar Jul 01 '24 08:07 kellertuer

What breaking change? Please help me understand what you are asking for here and I can help out.

Thank you for offering help but everything except this issue has already been resolved. SciML folks help resolve any issues quite quickly but still, our CI failures caused by upstream changes or bugs mostly originate in SciML. At the same time SciML most likely has more development than other dependencies, so per commit you may even be more reliable than other libraries but it's hard to tell.

mateuszbaran avatar Jul 01 '24 09:07 mateuszbaran

I have no clue (today) what any of the parameter is, I just copied that from an example. back then I could maybe figure out 20% of what the parameters mean because none of them are documented anywhere (and I am not a differential equations guy much). So when we do such guesswork, we maybe also guessed, the parameter 1 is the right thing to do from an example (where that was the case). Now in 2.40, you surely did not change the parameter itself, but how you interpret and work with it. Before you maybe were more relaxed about what the value was?

That code is a little scary 😅.

alg_cache, perform_step!, initialize!, etc. it looks like you're making dispatches on internals of OrdinaryDiffEq? It is described in the devdocs, though I do agree that we need to update the internals docs since it has been awhile.

  • https://docs.sciml.ai/DiffEqDevDocs/stable/contributing/adding_algorithms/
  • https://docs.sciml.ai/DiffEqDevDocs/stable/contributing/diffeq_internals/

I would highly recommend sticking to public APIs if you want stability. There are no guarantees to the outside that internal functions will not change any arguments or behavior. If that was the case then every release would have to be a breaking release! For information on Julia's public interfaces, I would recommend reading:

https://discourse.julialang.org/t/what-does-the-new-public-keyword-actually-do/108975

As a matter of style, we always define our public APIs in the top level file. You can see the public APIs of OrdinaryDiffEq here:

https://github.com/SciML/OrdinaryDiffEq.jl/blob/v6.85.0/src/OrdinaryDiffEq.jl#L236-L469

The public APIs of a solver package are simply the algorithms. The functions used to define the stepping behavior of the algorithms are generally not meant to be documented public API.

I am not sure, whether maybe the solution in this PR is enough, but I also could not find what N actually refers to by looking at either https://docs.sciml.ai/DiffEqDocs/stable/types/ode_types/#SciMLBase.ODESolution or (had to look for that manually since it was not linked) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractODESolution or (same manual search) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractTimeseriesSolution ... (and then I got a bit tired).

It corresponds to the dimension of the solution object through its indexing definition. The indexing interface is defined here:

  • https://docs.julialang.org/en/v1/manual/interfaces/

I am not sure, whether maybe the solution in this PR is enough, but I also could not find what N actually refers to by looking at either https://docs.sciml.ai/DiffEqDocs/stable/types/ode_types/#SciMLBase.ODESolution or (had to look for that manually since it was not linked) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractODESolution or (same manual search) https://docs.sciml.ai/SciMLBase/stable/interfaces/Solutions/#SciMLBase.AbstractTimeseriesSolution ... (and then I got a bit tired).

Yes, sorry about that. We have been solidifying our interfaces over the last year but it seems like you found a place that's not well-defined in the docs. The solution types boil down to acting as an AbstractVectorOfArray, and we are missing a full definition of that interface.

https://docs.sciml.ai/RecursiveArrayTools/stable/array_types/#RecursiveArrayTools.VectorOfArray

I'll see if I can get around to it in the JuliaCon time frame. But basically, AbstractVectorOfArray is almost Vector{AbstractArray{T,_N}}, and thus N = _N + 1, i.e. a vector of vectors has a matrix structure on it. However, it's only truly an AbstractArray if the internal vector is always constant size, which isn't always the case. If the structure is ragged, it's still definable in indexing, conversions, broadcast, etc. but operations that assume fully rectangular cannot exist (conversion to Array). Thus AbstractVectorOfArray is not AbstractArray, and this is a change we made last year as, again, we're cleaning all of our interfaces in order to make sure they are well-defined and strict in their behavior. And we have setup a lot of error checking setup to ensure interfaces aren't violated.

The issue of course is that strict enforcement of interfaces which were previously loosely enforced causes a few hiccups in the edge cases, and that is exactly what we're seeing here.

If ndims is required to be defined to use ODESolution then we can't use it here.

Is there a more general interface function that should be considered here? I don't know of one in ArrayInterface.jl but I'd be interested to hear what the correct generalization is if it helps. I would think that the tensor dimensionality of a manifold object would be well-defined as that's more a definition of what kind of operator/vector an object is. The tensor dimensionality "are you a vector or matrix" is not the dimensionality of the manifold, i.e. the length of a state representation of an object in the vector space. And this would allow for example different charts/representations, because it's not the length of the data, and this would then use the ragged AbstractVectorOfArray structure as above.

If I'm misunderstanding though, I'd be happy to learn how this needs to be generalized for manifold cases.

ChrisRackauckas avatar Jul 01 '24 12:07 ChrisRackauckas

Thanks for this very detailed answer. Concerning

I would highly recommend sticking to public APIs if you want stability.

I tried my best to do that ;)

For the ndims argument. The problem is that our tangent vectors might be more general than (just) an army, see for example https://github.com/JuliaManifolds/Manifolds.jl/blob/676b0f5d0751f4899bec67a0a81b7f313e1f6db7/src/manifolds/FixedRankMatrices.jl#L88-L106 (or any other <: TVector type). They have to still follow “Vectorian ideas” (addition, scaling etc.). But ndims does not make sense for the type linked. That might have been the idea to set it to 1, though that is a bit of cheating.

kellertuer avatar Jul 01 '24 13:07 kellertuer

One additional comment on the tutorial you mentioned

https://docs.sciml.ai/DiffEqDevDocs/stable/contributing/adding_algorithms/

There I have exactly the problem with for example

function alg_cache(alg::RK_ALG,u,rate_prototype,uEltypeNoUnits,uBottomEltypeNoUnits,tTypeNoUnits,uprev,uprev2,f,t,dt,reltol,p,calck,::Val{true})
[...]

I am missing a bit what all the parameters of the function are. I think for the above-mentioned Manifold-Explicit-Eurer I (or Metusz) took exactly that tutorial and tried to make sense of the parameters and what they mean for the algorithm to use the DiffEq framework, but I probably understand far too less (still) about the rest of the framework. Yes the public API is basically all algorithms, but one idea here, is to implement new algorithms (on Manifolds that is). So for that it seems not all we have to use (or we think we have to use) seems to be public then. How can we then best avoid this package to fail regularly? Because that happened a few times already.

kellertuer avatar Jul 01 '24 13:07 kellertuer

I would highly recommend sticking to public APIs if you want stability. There are no guarantees to the outside that internal functions will not change any arguments or behavior. If that was the case then every release would have to be a breaking release! For information on Julia's public interfaces, I would recommend reading:

IIRC I needed to overload some internals (specifically, build_solution) to get the interpolation method I wanted. Or, at least the one close to what I wanted because it should be on-manifold Hermite interpolation but it is fairly difficult to implement.

Is there a more general interface function that should be considered here? I don't know of one in ArrayInterface.jl but I'd be interested to hear what the correct generalization is if it helps. I would think that the tensor dimensionality of a manifold object would be well-defined as that's more a definition of what kind of operator/vector an object is. The tensor dimensionality "are you a vector or matrix" is not the dimensionality of the manifold, i.e. the length of a state representation of an object in the vector space. And this would allow for example different charts/representations, because it's not the length of the data, and this would then use the ragged AbstractVectorOfArray structure as above.

The thing is, there are (to my knowledge) no generic and useful operations for arbitrary manifolds that generalize ndims. We either work on a specific representation for a specific manifold, and then ndims is well-defined and used as necessary. Or, we linearize representation using get_coordinates or get_parameters and we get plain vectors of numbers that have ndims equal to 1. We just didn't see the need to linearize to something other than a vector but it could be added. The thing about manifold linearizations is that you have to be explicit about how you linearize and that is by design.

We have charts and representations figured out here and it works perfectly fine without ndims. "Give me the number of indices" requires an assumption about indexing, and indexing is meaningless without a linearization or an embedding. Linearization needs to be explicitly specified by a certain additional object in JuliaManifolds, while the embedding can be a tuple of vectors and matrices. ArrayPartition sort of gives up here and pretends to be a vector but if that is enough, why can't we pretend that everything is a vector? If an operation can meaningfully distinguish between vectors and matrices, it will ignore the inner structure of ArrayPartition unless it's explicitly special-cased, which is prone to errors when we have about 10 or more other structs like that.

mateuszbaran avatar Jul 01 '24 13:07 mateuszbaran

The issue was fixed by #33 .

mateuszbaran avatar Sep 03 '24 17:09 mateuszbaran