python-webdav icon indicating copy to clipboard operation
python-webdav copied to clipboard

unicode problem of HTML parser

Open martinhoefling opened this issue 12 years ago • 4 comments

I get the following traceback:

ValueError                                Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
    173             else:
    174                 filename = fname
--> 175             __builtin__.execfile(filename, *where)

/home/martin/local/lib/python2.6/site-packages/python_webdav/bunch.py in <module>()
      3 cl = python_webdav.client.Client("myserver")
      4 cl.set_connection(username="martin",password="mypass")
----> 5 cl.ls()

/home/martin/local/lib/python2.6/site-packages/python_webdav/client.pyc in ls(self, path, list_format, separator, display)
    148         if not path:
    149             path  = self.connection.path
--> 150         props = self.client.get_properties(self.connection, path)
    151         property_lists = []
    152         for prop in props:

/home/martin/local/lib/python2.6/site-packages/python_webdav/connection.pyc in get_properties(self, connection, resource_uri, properties)
    327             #parser = python_webdav.parse.Parser()

    328             parser = python_webdav.parse.LxmlParser()
--> 329             parser.parse(prop_xml)
    330             properties = parser.response_objects
    331             return properties

/home/martin/local/lib/python2.6/site-packages/python_webdav/parse.py in parse(self, data)
     63 
     64         """
---> 65         data_elements = HTML(data)
     66         xml_etree = ElementTree(data_elements)
     67         all_response_elements = xml_etree.findall("//response")

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree.HTML (src/lxml/lxml.etree.c:54134)()

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)()

ValueError: Unicode strings with encoding declaration are not supported.

Any ideas?

martinhoefling avatar Dec 04 '12 21:12 martinhoefling

Hi,

this might be caused by the XML returned containing an invalid encoding. Do you know what the webdav server is? (Apache? pywebdav? lighthttpd?). Do you have access to the list of files and directories that you are trying to run ls() against? It would be useful if I could try and debug against real data.

scaryclam avatar Dec 04 '12 23:12 scaryclam

Well, the server is the Plone/Zope Webdav implementation. I have the tinydav implementation running and there I get responses like this:

<?xml version="1.0" encoding="utf-8"?>
<d:multistatus xmlns:d="DAV:">
<d:response>
<d:href>/Plone/foo/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>Just a Title Foo</n:title>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>Just a Title Foo</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Tue, 04 Dec 2012 23:16:12 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
</d:multistatus>

Does this help?

martinhoefling avatar Dec 04 '12 23:12 martinhoefling

Yes and no. That response looks well formed and is parsed properly by lxml.

I don't think that the root of this issue is in the python_webdav library itself. It seems to be in the lxml code. If I can find the cause of the issue then I can try and negate it from the python_webdav side.

That said, I've been playing around with different parsers. The develop branch is using beautifulsoup as a parser, though that branch is still unstable due to a problem with how requests handles files.

If it's possible could you get me the exact directory listing that is causing this issue so that I can try and find a way around this issue? Otherwise, I will try and expedite the next release to give the beautifulsoup parser as an option for anyone finding this to be a problem.

scaryclam avatar Dec 05 '12 00:12 scaryclam

So this was actually not the same directory, here is the content of data at the begining of the parse function and traceback.

XML:

<?xml version="1.0" encoding="utf-8"?>
<d:multistatus xmlns:d="DAV:">
<d:response>
<d:href>/Plone/photos/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>Photos</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>Photos</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Sat, 12 May 2012 17:38:38 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2001/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2001</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2001</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Mon, 07 May 2012 10:48:44 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2002/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2002</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2002</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:47:22 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2003/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2003</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2003</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:47:13 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2004/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2004</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2004</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Fri, 23 Nov 2012 15:53:27 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2005/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2005</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2005</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:47:02 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2006/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2006</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2006</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:46:57 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2007/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2007</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2007</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:46:52 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2008/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2008</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2008</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:46:48 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2009/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2009</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2009</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:46:41 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2010/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2010</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2010</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:46:36 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2011/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2011</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2011</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:46:28 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2012/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
  <d:prop>
  <n:title>2012</n:title>
  <n:layout>atct_album_view</n:layout>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
  <d:prop>
  <n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
  <n:displayname>2012</n:displayname>
  <n:resourcetype><n:collection/></n:resourcetype>
  <n:getcontenttype>text/plain</n:getcontenttype>
  <n:getcontentlength>1</n:getcontentlength>
  <n:source></n:source>
  <n:supportedlock>
  <n:lockentry>
  <d:lockscope><d:exclusive/></d:lockscope>
  <d:locktype><d:write/></d:locktype>
  </n:lockentry>
  </n:supportedlock>
  <n:lockdiscovery>

</n:lockdiscovery>
  <n:getlastmodified>Thu, 03 May 2012 16:46:22 GMT</n:getlastmodified>
  </d:prop>
  <d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
</d:multistatus>

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
    173             else:
    174                 filename = fname
--> 175             __builtin__.execfile(filename, *where)

/home/martin/dump.py in <module>()
      3 cl = python_webdav.client.Client("mywebdav")
      4 cl.set_connection(username="martin",password="mypass")
----> 5 cl.ls()

/home/martin/local/lib/python2.6/site-packages/python_webdav/client.pyc in ls(self, path, list_format, separator, display)
    148         if not path:
    149             path  = self.connection.path
--> 150         props = self.client.get_properties(self.connection, path)
    151         property_lists = []
    152         for prop in props:

/home/martin/local/lib/python2.6/site-packages/python_webdav/connection.pyc in get_properties(self, connection, resource_uri, properties)
    327             #parser = python_webdav.parse.Parser()

    328             parser = python_webdav.parse.LxmlParser()
--> 329             parser.parse(prop_xml)
    330             properties = parser.response_objects
    331             return properties

/home/martin/local/lib/python2.6/site-packages/python_webdav/parse.py in parse(self, data)
     64         """
     65         print data
---> 66         data_elements = HTML(data)
     67         xml_etree = ElementTree(data_elements)
     68         all_response_elements = xml_etree.findall("//response")

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree.HTML (src/lxml/lxml.etree.c:54134)()

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)()

ValueError: Unicode strings with encoding declaration are not supported.

martinhoefling avatar Dec 05 '12 20:12 martinhoefling