go-xml
go-xml copied to clipboard
Types with restrictions should have Marshal/Unmarshal methods
In xsdgen/xsdgen.go, any restrictions on types aren't given special treatment, only a simple comment is added to the resulting type.
This should be changed to proper Mashal/Unmarshal methods so full XML validation is possible.
My concern with this is creating an explosion in the amount of generated code and even the number of types. One feature of the xsd package is that it tries to use the builtin Go types if introducing a new type adds no value (for some definition of "value"). I didn't intend for this package to be used for validation, as that is far more work. But let's give this a try.
For some of the restrictions, there are many ways to do this.
Enumerations
Consider the following type (taken from w3schools):
<xs:element name="car">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Audi"/>
<xs:enumeration value="Golf"/>
<xs:enumeration value="BMW"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Here's what xsdgen does today:
// Must be one of Audi, Golf, BMW
type Car string
Here's one possible solution:
// Must be one of Audi, Golf, BMW
type Car string
var validCar = map[Car]struct{} {
"Audi": struct{},
"Golf": struct{},
"BMW": struct{},
}
func (c Car) MarshalText() ([]byte, error) {
if _, ok := validCar[c]; !ok {
return nil, fmt.Errorf("%q is not a valid %T", c, c)
}
return []byte(c), nil
}
func (c *Car) UnmarshalText(text []byte) error {
if _, ok := validCar[Car(text)]; !ok {
return fmt.Errof("%q is not a valid %T", text, c)
}
*c = Car(text)
return nil
}
- Pro: validation is simple and "generic" for all comparable types
- Pro: it's obvious that Car is some kind of string
- Con: have to write []byte -> int, []byte -> float, []byte -> byte, and any other base type conversions (we already do this for handling list simpleTypes).
Another option mapping to an int:
type Car int
const (
Audi Car = iota
Golf
BMW
)
var validCars = map[string]Car{
"Audi": Audi,
"Golf": Golf,
"BMW": BMW,
}
func (c *Car) UnmarshalText(text []byte) error {
if x, ok := validCars[string(text)]; ok {
*c = x
return nil
}
return fmt.Errorf("%q is an invalid %T", text, c)
}
func (c Car) MarshalText() ([]byte, error) {
return []byte(c.String()), error
}
- Pro: Arguably easier to work with int-based enum, rather than string.
- Pro: Set of valid Car's exposed at the package level
- Pro: Can leverage stringer to make the mapping for
MarshalText - Con: Have to make sure enum vals are valid Go identifiers. Will interfere with stringer.
- Con: High likelihood of name collisions may have to use hungarian notation
Patterns
The regular expressions defined by XSD seem pretty close to RE2, used by the regexp package. I would be satisfied with opportunistically generating Marshal/Unmarshal methods iff the regexp package can compile the pattern.
Assertion
Nope. Not worth it. I don't want to link in an xpath implementation just to validate an xml type.
Other restrictions:
Min/max inclusive/exclusive could be useful, and should be easy enough to implement.
The whitespace restriction is necessary, I think, as it changes how a value should be parsed.
explicitTimezone? pass :D
I actually went ahead and did already implement some of these.
But I'm still struggling with a few other things related to the MPEG-DASH XSD I mentioned in #16. In particular, some types may contain both an element and an attribute called "title". xsdgen is currently not capable of generating separate fields for this case.
I opened #23 for that
A possible compromise would be to add constants of all possible values for enumerations (without validation during unmarshalling). That will help with type safety when using these values later on.