mavlink icon indicating copy to clipboard operation
mavlink copied to clipboard

Problems caused by encoding

Open qq854051086 opened this issue 4 years ago • 5 comments

mavlink\pymavlink\generator\mavgen.py +200

with open(xmlfile, 'r') as f:

change

with open(xmlfile, 'r', encoding="utf-8") as f:

qq854051086 avatar Mar 04 '22 05:03 qq854051086

@peterbarker What do you make of this suggestion? The existing code does not break the standard messages, so I suspect someone wants to have non-ascii characters in their dialect.

In theory it is nice to allow unicode values, and Python runs smoother if you enforce unicode encoding. Are there any risks to this? I.e. will our generators or generated languages break because they assume ascii?

hamishwillee avatar Mar 09 '22 01:03 hamishwillee

@peterbarker What do you make of this suggestion? The existing code does not break the standard messages, so I suspect someone wants to have non-ascii characters in their dialect.

In theory it is nice to allow unicode values, and Python runs smoother if you enforce unicode encoding. Are there any risks to this? I.e. will our generators or generated languages break because they assume ascii?

I think it's entirely reasonable to allow non-ascii in things like xml comments and field descriptions.

I'd be very leery of allowing utf-8 in things like field names and the like, however - I could see a lot of things breaking there!

I'm not sure we could allow the above patch unless we were to add some constraints around the more technical aspects of the XML document - or if they're there that we know they work :-) My reasoning is that we might find the above patch doesn't cause issues for our current CI but someone does use a unicode character in a field name and it works for a subset of mavlink stacks.

It would be nice to know exactly what problem is being solves with the patch - we're just guessing that it's i18n support...

peterbarker avatar Mar 11 '22 00:03 peterbarker

@peterbarker my suggestion that it might be related to windows, because nowadays most of linuxes use utf-8. Maybe it'll be better to open file in binary mode (rb) and then let xml library to decide charset header?

vooon avatar Mar 11 '22 10:03 vooon

@vooon Thanks, you might be right. @peterbarker is correct - we need to ask why they need this.

It might be that we should validate against this anyway. We could match on a pattern like <xsd:pattern value="{IsBasicLatin}*"/>. Thing is we'd need to be pretty careful to make sure we got all the cases we needed to and didn't block any we wanted. What we'd really like to do is say "globally ascii" except for these exceptions - but you'd need to be better at XSD than I am.

hamishwillee avatar Mar 16 '22 04:03 hamishwillee

@qq854051086 Can you please outline what you were trying to do when you ran into the problem for which this was the response?

hamishwillee avatar Mar 16 '22 04:03 hamishwillee

Suggest close - should be against pymavlink

auturgy avatar May 24 '23 04:05 auturgy

Closing. @qq854051086 if you want this, please repost in https://github.com/ArduPilot/pymavlink with an explanation of the specific problem you are trying to solve.

hamishwillee avatar May 25 '23 05:05 hamishwillee