Unexpected argument `is_multilingual` (from `IMF_DATA3` source)
What I did
import sdmx
client = sdmx.Client(source='IMF_DATA3')
client.dataflow(resource_id='BOP', agency_id='IMF.STA', references='all')
What happened
sdmx.exceptions.XMLParseError: TypeError: FacetType.__init__() got an unexpected keyword argument 'is_multilingual'
The traceback points to the following snippet of code:
https://github.com/khaeru/sdmx/blob/71cdf849b90fb6f6fbc71786b0c1dc6e24c44a61/sdmx/reader/xml/v21.py#L596-L603
Thoughts
It looks like is_multilingual slips through the cracks, which causes an error because FacetType doesn't expect it. I'm not sure whether the issue is with the IMF_DATA3 source or with this library, but maybe both spellings should be popped?
Thanks for a clear report. It would help, if you can, to have (a) the exact URL for the query that is performed, and (b) the whole XML response, or perhaps just the offending part.
Reflecting a little bit on the code:
- Here are some search results in the test specimen collection. For example, the file v3/xml/ECB_EXR_CA_DSD.xml (one of the official "samples" published with the SDMX-ML 3.0 standard) contains:
<str:TextFormat textType="String" isMultiLingual="true" maxLength="200" /> sdmx.reader.xmluses this utility function to convertlowerCamelCase(XML) tosnake_case(Pythonic).- The code reacts to the intermediate capital letters, so the 'M' and 'L' in 'isMultiLingual' get replaced to '_m' and '_l', thus 'is_multi_lingual'. This is the key that is discarded in the existing code.
- You can see that this function will produce different results if the input is differently capitalized:
>>> from sdmx.reader.xml.common import to_snake >>> to_snake("isMultiLingual") # As in the schemas 'is_multi_lingual' >>> to_snake("isMultilingual") # Second 'l' lower case 'is_multilingual'
- Thus my guess is that the offending XML contains something like:
…with a lower-case 'l' in 'lingual'.<str:TextFormat textType="String" isMultilingual="true" maxLength="200" /> - Since the XML schemas (here) specify the lowerCamelCase name, and XML is case sensitive, the latter would be malformed.
- I would guess that if you tried to validate the SDMX-ML, it would fail validation.
If those guesses are correct, then this would be a quirk of the particular source, rather than an error in this package. You are right that the package can (and does) work around non-standard content that appears in upstream sources. One thing that helps decide whether to do that or not is whether it's easy to communicate with data providers.
In this case, we have had several valuable contributions from @aboddie, who I think is involved with running the IMF sources. Perhaps he can give some info about whether this is a known issue that can and will be fixed. If so, then I'd prefer to wait for that to happen and let you work around in the meantime by inserting a line like:
args.pop("is_multilingual", None) # For khaeru/sdmx#250
If it can't/won't be fixed, then I'd be open to a PR that adds this to the package.
The URL for the query is https://api.imf.org/external/sdmx/3.0/structure/dataflow/IMF.STA/BOP/+?references=all, and here's an offending snippet from the XML response:
<structure:Concept id="SHORT_SOURCE_CITATION"
urn="urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=IMF:CS_MASTER(2.2.0).SHORT_SOURCE_CITATION">
<common:Name xml:lang="en">Short Source Citation</common:Name>
<common:Description xml:lang="en">A brief reference to the source of data or
information used in a resource, typically including the author, title, and
year of publication.</common:Description>
<structure:CoreRepresentation>
<structure:TextFormat textType="String" isMultilingual="Optional[true]" />
</structure:CoreRepresentation>
</structure:Concept>
So this does seem to be an issue with this source (although grammatically the spec itself is wrong, because "multilingual" is a single word, but that's neither here nor there).
I was able to monkey patch the second pop in for now, so no pressure to rush a workaround from my end 😄
Thank you for flagging this. Interestingly we return isMultilingual for XML and isMultiLingual for JSON. This will be fixed in the IMF API at the end of November, and all formats will return the "correct" capitalization.
OK, thanks for the confirmation!
I would suggest we leave the issue open for visibility, and then @benfrankel you can close it when the fix appears to be live and the monkeypatch no longer needed.