sec-edgar
sec-edgar copied to clipboard
XMLParsedAsHTMLWarning: It looks like you're using an HTML parser to parse an XML document
Hello
Now sure why I'm getting below warning, if someone can help me please
# python3 test.py
/root/secedgar/lib/python3.12/site-packages/secedgar/client.py:218: XMLParsedAsHTMLWarning: It looks like you're using an HTML parser to parse an XML document.
Assuming this really is an XML document, what you're doing might work, but you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the Python package 'lxml' installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
If you want or need to use an HTML parser on this document, you can make this warning go away by filtering it. To do that, run this code before calling the BeautifulSoup constructor:
from bs4 import XMLParsedAsHTMLWarning
import warnings
warnings.filterwarnings("ignore", category=XMLParsedAsHTMLWarning)
return BeautifulSoup(self.get_response(path, params, **kwargs).text,
# cat /etc/issue
Ubuntu 24.04.2 LTS \n \l
# python3 -V
Python 3.12.3
# cat test.py
from secedgar import CompanyFilings, FilingType
my_filings = CompanyFilings(cik_lookup=['aapl'],
filing_type=FilingType.FILING_4,
user_agent='Name (email@gmail)')
my_filings.save('/root/tempdir')
I tried pip install git+https://github.com/sec-edgar/sec-edgar.git and pip install sec-edgar all with the same issue
I have the same issues.
me too
To parse this document as XML, make sure you have the Python package 'lxml' installed,
Did you pip install lxml?
You would probably need to manually parse all the files as xml like I've had to as it's probably used a HTML parser rendering your downloaded files unintelligible. Try this:
import os
from bs4 import BeautifulSoup
save_directory = '/your/directory/'
for filename in os.listdir(save_directory):
if filename.endswith('.txt'):
file_path = os.path.join(save_directory, filename)
with open(file_path, 'r', encoding='utf-8') as f:
file_content = f.read()
soup = BeautifulSoup(file_content, features="xml")
print(f"Successfully parsed {filename} as XML.")
except Exception as e:
print(f"Failed to parse {filename} as XML. Error: {e}")