simbody [WIP] Add infrastructure for state serialization

This is the first crack at State serialization, which is not yet complete. This contains the necessary infrastructure to provide toXmlElement() and fromXmlElement() support for general types and (most important) for AbstractValue and Value<T>.

See TestXml.cpp for test cases forAbstractValue and Value<T>. State.toXmlElement() and State.fromXmlElement() exist and will work if all the contained AbstractValue types have those methods, or if they have stream extraction/insertion operators from which workable defaults an be created.

No State serialization tests are committed here yet. Those need to get written and debugged. @chrisdembia, if you want to work on this have a look and then we can discuss.

This change is

May 02 '16 04:05 sherm1

Fantastic! I'll start looking at this soon.

May 02 '16 05:05 chrisdembia

Unfortunately, std::is_null_pointer<T> is C++14, not C++11.

May 02 '16 05:05 chrisdembia

std::is_null_pointer<T> is C++14, not C++11

Removed check for now. Should pass on Linux and OSX now; still one mysterious test failure on Windows.

May 02 '16 15:05 sherm1

I investigated the Windows failure. It appears to be a problem with the Visual Studio linker not correctly generating a single copy of each Value<T> class so that its static members are shared by all. Hence it tries to re-register the deserialization method for Value<double> and triggers an error message. It would be easy to remove the check and ignore subsequent registration (in which case all the tests will pass), but I am worried about what will happen on deserialization because dynamic_cast may not recognize the two Value<double> objects as being the same type. I'll leave it for now as a reminder that further investigation is needed; there may be a workaround that will ensure the linker works properly.

May 02 '16 15:05 sherm1

Ah I know very little about that stuff, but would extern help?

May 02 '16 16:05 chrisdembia

Ah I know very little about that stuff, but would extern help?

Can't be extern because Value<T> is templatized and thus has to be defined in headers. The linker then has a nasty job to perform to ensure that every compilation unit that has instantiated Value<SomeType> for the same template argument uses the same static data members -- I'm depending on that to ensure that each type is registered only once for deserialization.

May 02 '16 17:05 sherm1

Maybe it is desirable to prefer a free function (that is, template specialization of toXmlElement()) over a member function) if both exist.

No, as explained in the Unified Call Syntax doc you linked to, the member function is to be preferred because it was provided by the class designer.

May 06 '16 23:05 sherm1

Drawing a parallel to std::begin() and std::end() ...

Coincidentally that is also discussed in the Unified Call Syntax doc, where it was mostly described as a hack for range-for; it is definitely not a general principle and it didn't seem that they were particularly happy with it. (That's Stroustrup and Sutter so I'm inclined to take them seriously!)

May 06 '16 23:05 sherm1

One potential issue with preferring the member function over a free function: say today there is a type X that I want to serialize but I don't have control over it, so I write a free function. Then next year, the author of X decides to add a toXmlElement() member function to that class. Now, the functionality has silently changed.

May 07 '16 00:05 chrisdembia

Now, the functionality has silently changed.

Hmmm ... any time someone changes the code of a class you are using the behavior can change! Hopefully the author took seriously the need to serialize the class and now you won't have to. Otherwise I recommend sending a nasty email!

May 07 '16 00:05 sherm1

I suppose that's true.

@klshrinidhi seemed to be able to get rid of the third argument!

May 07 '16 02:05 chrisdembia

One way to get rid of the name toXmlElementHelper is to rename toXmlElement to toXmlElementImpl, to signify this is the implementation of a function that does the conversion, freeing the use of the name toXmlElement to the interface.

May 08 '16 22:05 chrisdembia

I have started working on a TestStateXml test for Simbody, and will open a PR for that eventually.

Separately, I gave my first shot at writing out an OpenSim Model's State to XML. I got the following error:

26: Test failed due to exception: SimTK Exception thrown at String.h:353:
26:   Error detected by Simbody method String::String(T): Type T=SBModelVars has no stream insertion operator<<(T) and there is no specialized String(T) constructor.
26:   (Required condition '!"no stream insertion operator"' was not met.)

I think this means (1) SBModelVars does not have a toXmlElement() yet, and (2) the error for when toXmlElement() is not implemented for T does not indicate that the issue was a missing toXmlElement(). Actually, I don't understand how this was a run-time error instead of a compile-time one.

May 08 '16 22:05 chrisdembia

It's complaining about the stream insertion operator because that is used by default to serialize when there is no toXmlElement() available. Actually it is more contorted than that because it first falls back to writeUnformatted() which is how we get inf and nan to come out right among other things. It is writeUnformatted() that falls back to stream insertion. (Links are to the relevant code.)
Not sure whether the above contortions are a good idea, but they do save writing a lot of redundant toXmlElement() methods. Better ideas?
The reason it is a run time error is that there are many uses of Value<T> that don't require serializing and I didn't want to make people provide serialization methods if they don't intend to serialize. The original Value<T> did have this requirement (everyone had to provide operator<<()) and it turned out to be a huge inconvenience and resulted in many indecipherable compilation errors.

May 09 '16 00:05 sherm1

One way to get rid of the name toXmlElementHelper is to rename toXmlElement to toXmlElementImpl, to signify this is the implementation of a function that does the conversion, freeing the use of the name toXmlElement to the interface.

I don't understand. toXmlElement() is the name in the interface at the moment. Please clarify.

May 09 '16 00:05 sherm1

Not sure whether the above contortions are a good idea, but they do save writing a lot of redundant toXmlElement() methods. Better ideas?

Thanks for explaining. I like idea of piggybacking off writeUnformatted() and the stream operator. I'm also fine with the run-time message (over a compile-time message). What about just catching such exceptions in the toXmlElement() function that calls writeUnformatted() to provide a more informative message?

I don't understand. toXmlElement() is the name in the interface at the moment. Please clarify.

I mean the interface for the implementor of toXmlElement() methods. I'm thinking of the following: The interface for converting an object to an XML would be SimTK::toXmlElement(obj) (a free method). It may or may not work, depending if obj supports serialization. For some objects that users are more likely to want to serialize, we provide a convenience member function, e.g. State::toXmlElement(). To provide the ability to serialize an object, you either implement toXmlElementImpl(const T&) or T::toXmlElementImpl().

auto xmlDoc = SimTK::toXmlElement(SimTK::State()); // this would work and call SimTK::toXmlElementImpl
Xml::Element SimTK::State::toXmlElement() {
   return this->toXmlElementImpl();
}
Xml::Element SimTK::State::toXmlElementImpl() {
    ...
    elem.appendNode(toXmlElement(...));
    elem.appendNode(toXmlElement(...));
}

Let me know if I'm still not being clear. The gist is that if you want to obtain an Xml::Element, you call toXmlElement(). If you want to provide the ability to convert to Xml, you implement T::toXmlElementImpl(). The fuzzy area is member functions that you call to obtain an Xml::Element, since it's weird for a user to call a method toXmlElementImpl().

May 09 '16 04:05 chrisdembia

Alternatively, we could just rename toXmlElementHelper() to something that more clearly conveys the meaning findSomeWayToConvertThisThingToXmlForMe(). Like convertToXmlElement(), or createXmlElement() or formXmlElement().

May 09 '16 04:05 chrisdembia

I still like the idea of handling the potentially error-prone version and element name code in some central way. Here's a sketch of what I'm thinking:

class XmlSerialization {
public:
    XmlSerialization(std::string type, int version, std::string attrName="");
    template <typename T>
    void appendThing(std::string name, T value);
    Xml::Element getXmlElement() const;
};
...
Xml::Element Measure_Differentiate_Result::toXmlElement(const std::string& name) const {
        static const int version = 1;
        XmlSerialization xml("Measure_Differentiate_Result", version, name);
        xml.appendThing("operand", operand);
        xml.appendThing("operandDot", operandDot);
        xml.appendThing("derivIsGood", derivIsGood);
        return xml.getXmlElement();

I think there are a bunch of flaws with this proposal. For example, how would we require that people use the XmlSerialization class instead of writing their own serialization? Also, I haven't thought through how we would actually handle different versions, or how to combine this with the corresponding fromXmlElement() method. Also, perhaps the decentralized nature of the current serialization scheme is fine (where each class is in charge of doing it correctly). Anyway, it gets rid of some of the boilerplate.

It just sorta feels like the valuable information is "type name", "version", and a list of the members to serialize. It would be good if that were all one needed to specify.

May 09 '16 04:05 chrisdembia

Well, I'm more confused now! What I want to achieve is Xml::Element = state.toXmlElement() and state.fromXmlElement(element). Those are problematic because States contain AbstractValue objects, whose underlying Value<T> objects can be arbitrary existing types or user-defined types. That requires a user-written method for serializing type T that can be embedded in the serialization of Value<T>, and a corresponding user-written method that can be invoked when deserializing Value<T> objects. These should be member functions when possible (for general object-oriented code reasons), free functions otherwise.

I want those functions to be easy enough to write that there is a high probability of it being done successfully. Since user types will have members of arbitrary type (including other user types) the best way to do this would be recursively. So I'd like to be able to write a T.toXmlElement() method that invokes the TT.toXmlElement() methods of the data members of type TT. But that can't work exactly because we don't know if TT has a member, so I invoked the Helper there to figure out which method to invoke. @klshrinidhi's fix makes that more appealing since it gets rid of the mysterious third argument. It would be perfect I think if it could be named toXmlElement() so that a user writes the T.toXmlElement() method by calling a series of toXmlElement(TT) helper functions which map either to TT's member or a free function toXmlElement(TT). But I don't think that's possible because the helper and the free function would have the same name (is it?).

Anyway I could live with writing the member function or free function toXmlElement() in terms of toXmlElementHelper() with the final argument gone. I guess that's not so bad. Another alternative would to name the member function and helper function toXmlElement() but name the free functions toXmlElementFree() or something.

May 09 '16 04:05 sherm1

But I don't think that's possible because the helper and the free function would have the same name (is it?).

Even if it were possible, I think it might be confusing to use the same method name for so many things.

Anyway I could live with writing the member function or free function toXmlElement() in terms of toXmlElementHelper() with the final argument gone. I guess that's not so bad. Another alternative would to name the member function and helper function toXmlElement() but name the free functions toXmlElementFree() or something.

I think either of these are fine.

May 09 '16 05:05 chrisdembia

@sherm1 I am starting to add in the necessary toXmlElement() methods on other types. I was going to start working on SBInstanceVars but I'm not sure I actually know how to serialize this class. While some of the members seem like part of the state (mobilizerLockLevel, constraintIsDisabled), my gut says other members aren't actually part of the state, like bodyMassProperties.

class SBInstanceVars {
public:
    Array_<MassProperties,MobilizedBodyIndex>   bodyMassProperties;
    Array_<Transform,     MobilizedBodyIndex>   outboardMobilizerFrames;
    Array_<Transform,     MobilizedBodyIndex>   inboardMobilizerFrames;

    Array_<Motion::Level, MobilizedBodyIndex>   mobilizerLockLevel;
    Vector                                      lockedQs;
    Vector                                      lockedUs; // also used for udot

    Array_<bool,          MobilizedBodyIndex>   prescribedMotionIsDisabled;

    Vector                                      particleMasses;

    Array_<bool,ConstraintIndex>                constraintIsDisabled;

Could you tell me which of these members should be serialized?

May 14 '16 04:05 chrisdembia

(SBInstanceVars) Could you tell me which of these members should be serialized?

It's a variable, so everything in it should be serialized. Body mass properties are instance variables, but there aren't currently any methods for setting them so they just keep their default values.

May 16 '16 06:05 sherm1

there aren't currently any methods for setting them so they just keep their default values.

They are all public members though.

May 16 '16 16:05 chrisdembia

They are all public members though

But in an internal struct -- can't be seen from the API. Serialization is agnostic about public and private since all data has to get written out in order to get it back later.

May 16 '16 17:05 sherm1

Okay thanks for clarifying.

May 16 '16 17:05 chrisdembia

For deserializing, the fromXmlElementHelper() calls a member function that takes an Xml element and fills in the this object from that Xml element. However, the member function for the PerSubsystemInfo class has an (unimplemented) fromXmlElement() function that is static and takes a PerSubsystemInfo& as an argument. Which do you think makes more sense for fromXmlElement() member functions? To fill in this, or to fill in a provided-by-ref object?

May 23 '16 05:05 chrisdembia

I think I debated that with myself and ended up deciding it was (slightly) better to fill in the this object. So probably the unimplemented method represents some earlier point in my self-debate but should be converted now. I'm not sure I had any great arguments either way, but I like the symmetry of being able to say thing.toXmlElement() and thing.fromXmlElement().

May 23 '16 05:05 sherm1

Okay; sounds good to me.

May 23 '16 06:05 chrisdembia

There is a small issue with the error message one gets when trying to deserialize a type that doesn't have a fromXmlElement() implementation. Since readUnformatted() and (tryConvertTo()) will be available for any T, then the fromXmlElement() that calls readUnformatted() will be used if fromXmlElement() doesn't exist for T. This is fine in many cases, since you check readOK and give an error.

template <class T> inline void
fromXmlElement(T& thing, Xml::Element& elt, 
               const std::string& requiredName="") {
    if (!requiredName.empty()) {
        const String& name = elt.getElementTag();
        SimTK_ERRCHK2_ALWAYS(name==requiredName, "fromXmlElement<T>",
        "Expected element name '%s' but got '%s'.", requiredName.c_str(), 
                                                    name.c_str());
    }
    std::istringstream is(elt.getValue());
    const bool readOK = readUnformatted(is, thing);
    ...

There is one case, though, in which this leads to a confusing error message, and it stems from requiredName having two different uses. Sometimes, the "requiredName" applies to the tag name (as above), and in other cases, it applies to an attributed named "name". So the issue arises when trying to deserialize a type whose "requiredName" is an attributed named "name", but fromXmlElement() is unimplemented and so the above generic fromXmlElement() gets invoked, and thinks that the requiredName should be the tag name. This situation results in the following error message (for PerSubsystemInfo:

Error detected by Simbody method fromXmlElement<T>: Expected element name '0' but got 'Subsystem'

The simplest solution to me sounds like having only one interpretation for requiredName; that is, as an attribute named "name." I think this is possible by simply making the generic toXmlElement() wrap another element around the output of writeUnformatted().

May 23 '16 06:05 chrisdembia

Yeah, that dual-use name thing is odd. A better alternative would be nice.

May 23 '16 14:05 sherm1