python-webdav
python-webdav copied to clipboard
unicode problem of HTML parser
I get the following traceback:
ValueError Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
173 else:
174 filename = fname
--> 175 __builtin__.execfile(filename, *where)
/home/martin/local/lib/python2.6/site-packages/python_webdav/bunch.py in <module>()
3 cl = python_webdav.client.Client("myserver")
4 cl.set_connection(username="martin",password="mypass")
----> 5 cl.ls()
/home/martin/local/lib/python2.6/site-packages/python_webdav/client.pyc in ls(self, path, list_format, separator, display)
148 if not path:
149 path = self.connection.path
--> 150 props = self.client.get_properties(self.connection, path)
151 property_lists = []
152 for prop in props:
/home/martin/local/lib/python2.6/site-packages/python_webdav/connection.pyc in get_properties(self, connection, resource_uri, properties)
327 #parser = python_webdav.parse.Parser()
328 parser = python_webdav.parse.LxmlParser()
--> 329 parser.parse(prop_xml)
330 properties = parser.response_objects
331 return properties
/home/martin/local/lib/python2.6/site-packages/python_webdav/parse.py in parse(self, data)
63
64 """
---> 65 data_elements = HTML(data)
66 xml_etree = ElementTree(data_elements)
67 all_response_elements = xml_etree.findall("//response")
/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree.HTML (src/lxml/lxml.etree.c:54134)()
/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)()
ValueError: Unicode strings with encoding declaration are not supported.
Any ideas?
Hi,
this might be caused by the XML returned containing an invalid encoding. Do you know what the webdav server is? (Apache? pywebdav? lighthttpd?). Do you have access to the list of files and directories that you are trying to run ls() against? It would be useful if I could try and debug against real data.
Well, the server is the Plone/Zope Webdav implementation. I have the tinydav implementation running and there I get responses like this:
<?xml version="1.0" encoding="utf-8"?>
<d:multistatus xmlns:d="DAV:">
<d:response>
<d:href>/Plone/foo/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>Just a Title Foo</n:title>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>Just a Title Foo</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Tue, 04 Dec 2012 23:16:12 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
</d:multistatus>
Does this help?
Yes and no. That response looks well formed and is parsed properly by lxml.
I don't think that the root of this issue is in the python_webdav library itself. It seems to be in the lxml code. If I can find the cause of the issue then I can try and negate it from the python_webdav side.
That said, I've been playing around with different parsers. The develop branch is using beautifulsoup as a parser, though that branch is still unstable due to a problem with how requests handles files.
If it's possible could you get me the exact directory listing that is causing this issue so that I can try and find a way around this issue? Otherwise, I will try and expedite the next release to give the beautifulsoup parser as an option for anyone finding this to be a problem.
So this was actually not the same directory, here is the content of data at the begining of the parse function and traceback.
XML:
<?xml version="1.0" encoding="utf-8"?>
<d:multistatus xmlns:d="DAV:">
<d:response>
<d:href>/Plone/photos/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>Photos</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>Photos</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Sat, 12 May 2012 17:38:38 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2001/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2001</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2001</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Mon, 07 May 2012 10:48:44 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2002/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2002</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2002</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:47:22 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2003/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2003</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2003</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:47:13 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2004/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2004</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2004</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Fri, 23 Nov 2012 15:53:27 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2005/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2005</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2005</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:47:02 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2006/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2006</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2006</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:46:57 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2007/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2007</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2007</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:46:52 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2008/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2008</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2008</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:46:48 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2009/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2009</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2009</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:46:41 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2010/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2010</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2010</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:46:36 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2011/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2011</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2011</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:46:28 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
<d:response>
<d:href>/Plone/photos/2012/</d:href>
<d:propstat xmlns:n="http://www.zope.org/propsets/default">
<d:prop>
<n:title>2012</n:title>
<n:layout>atct_album_view</n:layout>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
<d:propstat xmlns:n="DAV:">
<d:prop>
<n:creationdate>1970-01-01T12:00:00Z</n:creationdate>
<n:displayname>2012</n:displayname>
<n:resourcetype><n:collection/></n:resourcetype>
<n:getcontenttype>text/plain</n:getcontenttype>
<n:getcontentlength>1</n:getcontentlength>
<n:source></n:source>
<n:supportedlock>
<n:lockentry>
<d:lockscope><d:exclusive/></d:lockscope>
<d:locktype><d:write/></d:locktype>
</n:lockentry>
</n:supportedlock>
<n:lockdiscovery>
</n:lockdiscovery>
<n:getlastmodified>Thu, 03 May 2012 16:46:22 GMT</n:getlastmodified>
</d:prop>
<d:status>HTTP/1.1 200 OK</d:status>
</d:propstat>
</d:response>
</d:multistatus>
Traceback:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
173 else:
174 filename = fname
--> 175 __builtin__.execfile(filename, *where)
/home/martin/dump.py in <module>()
3 cl = python_webdav.client.Client("mywebdav")
4 cl.set_connection(username="martin",password="mypass")
----> 5 cl.ls()
/home/martin/local/lib/python2.6/site-packages/python_webdav/client.pyc in ls(self, path, list_format, separator, display)
148 if not path:
149 path = self.connection.path
--> 150 props = self.client.get_properties(self.connection, path)
151 property_lists = []
152 for prop in props:
/home/martin/local/lib/python2.6/site-packages/python_webdav/connection.pyc in get_properties(self, connection, resource_uri, properties)
327 #parser = python_webdav.parse.Parser()
328 parser = python_webdav.parse.LxmlParser()
--> 329 parser.parse(prop_xml)
330 properties = parser.response_objects
331 return properties
/home/martin/local/lib/python2.6/site-packages/python_webdav/parse.py in parse(self, data)
64 """
65 print data
---> 66 data_elements = HTML(data)
67 xml_etree = ElementTree(data_elements)
68 all_response_elements = xml_etree.findall("//response")
/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree.HTML (src/lxml/lxml.etree.c:54134)()
/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)()
ValueError: Unicode strings with encoding declaration are not supported.