edgarWebR icon indicating copy to clipboard operation
edgarWebR copied to clipboard

Excessive depth in document

Open tangxuning opened this issue 5 years ago • 2 comments

Hi,

When I used parse_filing for the below URLs, here are the errors:

Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

Here are a few sample URLs: https://www.sec.gov/Archives/edgar/data/1065648/000106564809000009/form_10k.htm https://www.sec.gov/Archives/edgar/data/1010247/000101024709000005/form10k.htm https://www.sec.gov/Archives/edgar/data/861459/000086145909000013/form10-q.htm

Again, thanks very much for contributing this package! It's fantastic.

Best regards

tangxuning avatar Jan 20 '19 02:01 tangxuning

Interesting - thanks for the bug report - I'm starting to take a look at this now.

mwaldstein avatar Feb 01 '19 14:02 mwaldstein

I made an initial fix - they will all at least parse now.

These files have a particularly complicated structure, so it is likely that there will be other "hidden" parsing problems, with parts of the document being missed. Let me know if you see any issues, I did some work to try to cover for those edge cases but there is a good chance I didn't catch them all...

mwaldstein avatar Feb 02 '19 15:02 mwaldstein