mkdocs-video icon indicating copy to clipboard operation
mkdocs-video copied to clipboard

Fix "error: Document is empty." when empty files are present in /docs

Open Muxiner opened this issue 1 year ago • 3 comments

Symptom

When combined with mkdocs-video, an ERROR will occur if there are empty md files in the /docs path.

ERROR   -  Error reading page 'EMPTY.md': Document is empty
Traceback (most recent call last):
  ...
  File "...\Python310\site-packages\mkdocs_video\plugin.py", line 28, in on_page_content
    content = lxml.html.fromstring(html)
  File "...\Python\Python310\site-packages\lxml\html\__init__.py", line 873, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "...\Python\Python310\site-packages\lxml\html\__init__.py", line 761, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserError: Document is empty

https://github.com/soulless-viewer/mkdocs-video/blob/4c9b1fba49919ffc19bcd2c20ef868c26576e384/mkdocs_video/plugin.py#L27-L34

Analysis

According to the error message, I took a look at the code on line 761 of file lxml\html\__init__.py:

def document_fromstring(html, parser=None, ensure_head_body=False, **kw):
    if parser is None:
        parser = html_parser
    value = etree.fromstring(html, parser, **kw)
    if value is None: # << Here causes the problem
        raise etree.ParserError(
            "Document is empty")
    if ensure_head_body and value.find('head') is None:
        value.insert(0, Element('head'))
    if ensure_head_body and value.find('body') is None:
        value.append(Element('body'))
    return value

at https://github.com/lxml/lxml/blob/762f62c5a1ab62ce37397aeeab2c27fdcc14ca66/src/lxml/html/init.py#L756-L767

My understanding is that as mkdocs-video uses lxml, and when lxml converts the Markdown files, an ERROR is thrown because one of the Markdown files is empty and hence value is None.

As long as there are empty files present in /docs, an ERROR will be reported, even if the nav setting in mkdocs.yml did not explicitly include the file as a page.

Fix

In an attempt to fix this (rather easily), an extra check is added to the on_page_content method. The method will execute only when the passed html object is not empty, effectively skipping the empty file (as we don't need to process it anyways).


In hindsight, I think mkdocs does allow empty files to exist in /docs (site builds will proceed without problems with empty files). With mkdocs-video, site builds will fail with the symptom described above. With this patch, site builds will succeed without problems, just like before. The fix may not be perfect and may need some further modifications to meet project standards.

Muxiner avatar Sep 18 '23 08:09 Muxiner

any chance we can get this merged ?

mteichtahl avatar Sep 27 '23 06:09 mteichtahl

I am facing the same problem

pabloFuente avatar Oct 05 '23 13:10 pabloFuente

The same problem.

gorger3 avatar Oct 12 '23 15:10 gorger3