ofxparse icon indicating copy to clipboard operation
ofxparse copied to clipboard

XMLParsedAsHTMLWarning

Open kantskernel opened this issue 3 years ago • 4 comments

I see the following warning when inputting xml ofx file

XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml" into the BeautifulSoup constructor. warnings.warn(

Going through previous decisions and code behavior, I'm thinking it is intentional that HTML parser is used for XML (e.g. here)

I am thinking the warning shouldn't happen rather than me going in and specifying XML in the constructor - but might be misguided. Here is one more issue I saw related to this: https://github.com/EnergieID/entsoe-py/issues/180

My issue is not the same, I am actually using ofxparse in the context of beancount-reds-importers (FWIW)

Here's an example of what my input file is starting with:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?OFX OFXHEADER="200" VERSION="220"

kantskernel avatar Jun 09 '22 17:06 kantskernel

Ditto. Curious if other users aren't hitting this?

redstreet avatar Jun 16 '23 03:06 redstreet

I see this too.

thehilll avatar Jun 16 '23 07:06 thehilll

So yes, it was intentional to parse the XML this way. I don't recall this warning message appearing in the past, so one of the dependencies (BeautifulSoup?) must have added it. I can take a look at silencing the warning, or if someone else happens to look at it first, I'd be happy to review the change.

When I wrote this library parsing as XML would be too strict and parsing would fail, because SGML is a superset of XML. The HTML parser is more forgiving and just ignores the bits it doesn't understand.

jseutter avatar Jun 16 '23 23:06 jseutter

Thank you, @jseutter! I haven't looked at ofxparse, but this commit does exactly what is needed. I imagine you simply need to put it in the right file in ofxparse.

redstreet avatar Jun 17 '23 10:06 redstreet