[repository schema] Increase maximum length of MsgType attribute
The current definition of message types is based on the needs of FIX, i.e. one or two characters long:
<xs:simpleType name="MsgType_t">
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="2"/>
</xs:restriction>
</xs:simpleType>
They are intended as human-readable and well-known shortcuts for message names, e.g. "D" for FIX NewOrderSingle(35=D) or "8" for FIX ExecutionReport(35=8). Non-FIX protocols may have longer acronyms, e.g. SEC-CAT uses 5 characters to identify events and related messages, e.g. "MENO" for "New Order" and "MENOS" for "New Order Supplement".
The proposal is to still set the maximum length (to maintain the original intention of the attribute) but to change it from 2 to 6.
According to its original purpose implied by its usage in FIX, I think increasing the max length of MsgType_t to 6 makes sense.
After making that change, however, I would like to start another discussion about further codifying the concept of "message type determination" independently from FIX.
"Message type"/"MsgType" is not a fully-defined concept in Orchestra, and in fact it is fully implicit how an implementation could use the msgType attribute on messages to actually determine the type of a message. For example, in FIX:
- There is a code set MsgTypeCodeSet, the "value" of each code within the code set corresponds to the msgType attribute on message definitions
- There is a field MsgType, of type MsgTypeCodeSet, in the StandardHeader component, included in every message, which an implementation can use (by matching its string value to the msgType of defined messages) to identify the type of a message
However, this behavior is not actually codified in Orchestra (nothing "marks" the MsgType field in every message as indicating the message type, apart from its name).
My claim is that "MsgType" is a FIX, not Orchestra concept, even with encoding-specific handling requirements, and that Orchestra needs a general mechanism for expressing the set of message types and how to determine the message type for any given encoding.
I suggest that for the moment, we apply the change to increase the max length of the msgType attribute, as it is a very lightweight change, and then open a separate discussion about the future of message type determination in Orchestra.
@patricklucas I agree that this is a separate discussion. The topic is related to #166, even though it is related to a situation with multiple scenarios. The rule to use MsgType(35) in the message to find the msgType attribute in the Orchestra XML file is specific to FIX. To make it generic for Orchestra, FIX Latest could add a <fixr:when> element to each message definition, for example
<fixr:message msgType="D" name="NewOrderSingle" category="SingleGeneralOrderHandling" added="FIX.2.7" id="14" abbrName="Order">
<fixr:structure>
</fixr:structure>
<fixr:when>MsgType="D"/>
</fixr:message>
Additional attributes can be added to the <fixr:when> element in case of multiple scenarios for the NewOrderSingle message.
Following up on yesterday's call, I checked several different binary protocols (incl. Nasdaq protocols and NYSE Pillar) and all of them are using message types, which are at most 5 characters long and comprise either just numbers, letters, or a combination of both.
ISO20022, which Lisa mentioned, seems to be the most complex so far with the abcd.001.001.01 message identifier pattern. However, those message identifier strings are actually made up of 4 pieces of information: [Business Domain/Area].[Message Type/Number].[Variant Number].[Version]. I'm not sure if this would be mapped to Orchestra as a single message identifier (msgType), or if it would make more sense to separate the 4 parts into individual attributes and only the 3-digit message type/number would go into msgType.
Following up on yesterday's call, I checked several different binary protocols (incl. Nasdaq protocols and NYSE Pillar) and all of them are using message types, which are at most 5 characters long and comprise either just numbers, letters, or a combination of both.
Thanks Emil, that's very useful. I also checked non-FIX specs for LSEG, NASDAQ, SFC, JSE, FINRA, CDM etc, and cannot find any use of message types longer than 6 characters.
I'm not sure if this would be mapped to Orchestra as a single message identifier (
msgType), or if it would make more sense to separate the 4 parts into individual attributes and only the 3-digit message type/number would go intomsgType.
My strong preference is to model these attributes individually so they can be used in different contexts:
- Business Area: Part of message categorisation, to be modelled alongside Section/Category.
- Message Type: This is the core type field, consistent with ISO 15022.
- Variant: ISO 20022 describes variants here. Variants are clearly related to message scenarios. Note that ISO 20022 also introduces a related concept,
messageFunctionality, which is distinct from variants. - Version: In ISO 20022, this refers to versioning of a specific message (or variant). Orchestra, however, does not version messages but rather captures the audit trail of changes, such as FIX EPs.
As you can see, there are significant questions we need to address to fully support ISO 20022, particularly in modelling variants and message versioning. As Patrick mentioned above, we would also need "a general mechanism for expressing the set of message types and how to determine the message type for any given encoding". For ISO 20022, this would clearly be a composite of multiple attributes.
In my opinion, we should decouple this tactical fix from ISO 20022 requirements, as the latter is much broader and dependent on other "DeFIXify" initiatives, such as removing tag=value specific restrictions from Orchestra.
Propose we merge the existing PR as we have no firm requirement for longer type values.
I concur with @erakadjiev that the ISO message "identifier" carries different data elements that should be modeled separately. ISO has both business domains (payments, securities, trade finance, cards, FX) and business areas (there are many, e.g. "auth" that belongs to the securities domain, i.e. is maintained by the ISO 20022 Securities SEG). These seem to fit well with the Orchestra concepts of a section and a category within a section.
I do see versioning as an additional concept to the pedigree attributes that currently exist in Orchestra. The ISO concept is also to have version information built into the name of a message or component, i.e. ISO works with naming conventions. For example:
DerivativesTradeReportQueryV04(auth.029.001.04) is the 4th version of this message. The 5th version is called DerivativesTradeReportQueryV05 (auth.029.001.05) and uses a newer version for one of its elements.- TradePartyIdentificationQuery8 is the 8th version of this component. The 9th version added a new data element (CountryCode).
I will open a new issue for the bigger picture so that we can implement this minor change for RC2.
Just as a side note on naming conventions: ISO uses the keyword "Choice" as part of the name, e.g. PartyIdentification121Choice to represent what Orchestra does with the attribute oneOf. The concept is the same, i.e. to have mutually exclusive component members.