sdmx icon indicating copy to clipboard operation
sdmx copied to clipboard

Unexpected argument `is_multilingual` (from `IMF_DATA3` source)

Open benfrankel opened this issue 2 months ago • 4 comments

What I did

import sdmx
client = sdmx.Client(source='IMF_DATA3')
client.dataflow(resource_id='BOP', agency_id='IMF.STA', references='all')

What happened

sdmx.exceptions.XMLParseError: TypeError: FacetType.__init__() got an unexpected keyword argument 'is_multilingual'

The traceback points to the following snippet of code:

https://github.com/khaeru/sdmx/blob/71cdf849b90fb6f6fbc71786b0c1dc6e24c44a61/sdmx/reader/xml/v21.py#L596-L603

Thoughts

It looks like is_multilingual slips through the cracks, which causes an error because FacetType doesn't expect it. I'm not sure whether the issue is with the IMF_DATA3 source or with this library, but maybe both spellings should be popped?

benfrankel avatar Oct 21 '25 15:10 benfrankel

Thanks for a clear report. It would help, if you can, to have (a) the exact URL for the query that is performed, and (b) the whole XML response, or perhaps just the offending part.

Reflecting a little bit on the code:

  • Here are some search results in the test specimen collection. For example, the file v3/xml/ECB_EXR_CA_DSD.xml (one of the official "samples" published with the SDMX-ML 3.0 standard) contains:
    <str:TextFormat textType="String" isMultiLingual="true" maxLength="200" />
    
  • sdmx.reader.xml uses this utility function to convert lowerCamelCase (XML) to snake_case (Pythonic).
    • The code reacts to the intermediate capital letters, so the 'M' and 'L' in 'isMultiLingual' get replaced to '_m' and '_l', thus 'is_multi_lingual'. This is the key that is discarded in the existing code.
    • You can see that this function will produce different results if the input is differently capitalized:
      >>> from sdmx.reader.xml.common import to_snake
      >>> to_snake("isMultiLingual")  # As in the schemas
      'is_multi_lingual'
      >>> to_snake("isMultilingual")  # Second 'l' lower case
      'is_multilingual'
      
  • Thus my guess is that the offending XML contains something like:
    <str:TextFormat textType="String" isMultilingual="true" maxLength="200" />
    
    …with a lower-case 'l' in 'lingual'.
  • Since the XML schemas (here) specify the lowerCamelCase name, and XML is case sensitive, the latter would be malformed.
  • I would guess that if you tried to validate the SDMX-ML, it would fail validation.

If those guesses are correct, then this would be a quirk of the particular source, rather than an error in this package. You are right that the package can (and does) work around non-standard content that appears in upstream sources. One thing that helps decide whether to do that or not is whether it's easy to communicate with data providers.

In this case, we have had several valuable contributions from @aboddie, who I think is involved with running the IMF sources. Perhaps he can give some info about whether this is a known issue that can and will be fixed. If so, then I'd prefer to wait for that to happen and let you work around in the meantime by inserting a line like:

args.pop("is_multilingual", None)  # For khaeru/sdmx#250

If it can't/won't be fixed, then I'd be open to a PR that adds this to the package.

khaeru avatar Oct 22 '25 08:10 khaeru

The URL for the query is https://api.imf.org/external/sdmx/3.0/structure/dataflow/IMF.STA/BOP/+?references=all, and here's an offending snippet from the XML response:

<structure:Concept id="SHORT_SOURCE_CITATION"
    urn="urn:sdmx:org.sdmx.infomodel.conceptscheme.Concept=IMF:CS_MASTER(2.2.0).SHORT_SOURCE_CITATION">
    <common:Name xml:lang="en">Short Source Citation</common:Name>
    <common:Description xml:lang="en">A brief reference to the source of data or
        information used in a resource, typically including the author, title, and
        year of publication.</common:Description>
    <structure:CoreRepresentation>
        <structure:TextFormat textType="String" isMultilingual="Optional[true]" />
    </structure:CoreRepresentation>
</structure:Concept>

So this does seem to be an issue with this source (although grammatically the spec itself is wrong, because "multilingual" is a single word, but that's neither here nor there).

I was able to monkey patch the second pop in for now, so no pressure to rush a workaround from my end 😄

benfrankel avatar Oct 22 '25 13:10 benfrankel

Thank you for flagging this. Interestingly we return isMultilingual for XML and isMultiLingual for JSON. This will be fixed in the IMF API at the end of November, and all formats will return the "correct" capitalization.

aboddie avatar Oct 24 '25 11:10 aboddie

OK, thanks for the confirmation!

I would suggest we leave the issue open for visibility, and then @benfrankel you can close it when the fix appears to be live and the monkeypatch no longer needed.

khaeru avatar Oct 24 '25 13:10 khaeru