webdav-client icon indicating copy to clipboard operation
webdav-client copied to clipboard

Entities are not decoded

Open PVince81 opened this issue 2 years ago • 6 comments

See https://github.com/perry-mitchell/webdav-client/blob/v4.7.0/source/tools/dav.ts#L80-L81

it seems they used to and the comment says it will be decoded "manually later" but I didn't find any location in the code where it's done

PVince81 avatar Sep 30 '21 19:09 PVince81

Hi @PVince81 - I'm looking into this now. From a first glance it looks quite easy to solve - adding such a function to support safe decoding of all values early on would help:

export function decodeURLComponentSafely(value: string): string {
    if (!value) return value;
    try {
        return decodeURIComponent(value.replace(/%(?![0-9a-fA-F]+)/g, "%25"));
    } catch (err) {
        throw new Layerr(err, `Failed decoding value: ${value}`);
    }
}

And then I could add it to the parseXML call:

export function parseXML(xml: string): Promise<DAVResult> {
    return new Promise(resolve => {
        const result = xmlParser.parse(xml, {
            arrayMode: false,
            ignoreNameSpace: true,
            tagValueProcessor: val => decodeHTMLEntities(decodeURLComponentSafely(val))
        });
        resolve(normaliseResult(result));
    });
}

The problem is, what if the target file is literally named test%20file, which is a valid Linux filename. This is now decoded automatically returning an incorrect result set. I have everything working except this one facet.. Once I figure it out I'll push a fix.

In the meantime a sample response payload (XML) with the encoded entities would be helpful.

perry-mitchell avatar Oct 10 '21 11:10 perry-mitchell

@perry-mitchell thanks a lot for looking into this

I think we should not mix up concepts. This ticket is about decoding HTML/XML entities, not URL-decoding.

Just to answer your question about legit files called test%20file, I've attached an example Webdav response from a Nextcloud PROPFIND, see "d:href" tags:

<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns" xmlns:nc="http://nextcloud.org/ns">
  <d:response>
    <d:href>/remote.php/dav/files/admin/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:57 GMT</d:getlastmodified>
        <d:getetag>"6165285928a69"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>2</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>164</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%20regular%20space/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:14:25 GMT</d:getlastmodified>
        <d:getetag>"616527c16531d"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>113</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%22doublequote/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:57 GMT</d:getlastmodified>
        <d:getetag>"6165285926070"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>117</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%2520percent%2520twenty/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:14:00 GMT</d:getlastmodified>
        <d:getetag>"616527a88cdb6"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>112</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%27apostrophe/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:32 GMT</d:getlastmodified>
        <d:getetag>"6165284098733"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>114</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%3clowerthan/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:43 GMT</d:getlastmodified>
        <d:getetag>"6165284b826b5"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>115</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/test%3egreaterthan/</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:16:50 GMT</d:getlastmodified>
        <d:getetag>"6165285200f71"</d:getetag>
        <d:resourcetype>
          <d:collection/>
        </d:resourcetype>
        <oc:fileid>116</oc:fileid>
        <oc:permissions>RGDNVCK</oc:permissions>
        <oc:size>0</oc:size>
        <d:quota-available-bytes>-3</d:quota-available-bytes>
        <nc:has-preview>false</nc:has-preview>
        <nc:mount-type/>
        <nc:is-encrypted>0</nc:is-encrypted>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">31</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:getcontenttype/>
        <d:getcontentlength/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
  <d:response>
    <d:href>/remote.php/dav/files/admin/welcome.txt</d:href>
    <d:propstat>
      <d:prop>
        <d:getlastmodified>Tue, 12 Oct 2021 06:12:12 GMT</d:getlastmodified>
        <d:getetag>"cd3b89f9796ffb477bcf847e147cd154"</d:getetag>
        <d:getcontenttype>text/plain</d:getcontenttype>
        <d:resourcetype/>
        <oc:fileid>3</oc:fileid>
        <oc:permissions>RGDNVW</oc:permissions>
        <oc:size>164</oc:size>
        <d:getcontentlength>164</d:getcontentlength>
        <nc:has-preview>true</nc:has-preview>
        <nc:mount-type/>
        <x1:share-permissions xmlns:x1="http://open-collaboration-services.org/ns">19</x1:share-permissions>
        <oc:tags/>
        <oc:favorite>0</oc:favorite>
        <oc:comments-unread>0</oc:comments-unread>
        <oc:owner-id>admin</oc:owner-id>
        <oc:owner-display-name>admin</oc:owner-display-name>
        <oc:share-types/>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
    <d:propstat>
      <d:prop>
        <d:quota-available-bytes/>
        <nc:is-encrypted/>
      </d:prop>
      <d:status>HTTP/1.1 404 Not Found</d:status>
    </d:propstat>
  </d:response>
</d:multistatus>

However I think the library should leave these as is.

And here's a Webdav response from retrieving comments which contain special characters which SabreDAV converted to entities while serializing to XML. See oc:message tags:

<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns" xmlns:nc="http://nextcloud.org/ns">
 <d:response>
  <d:status>HTTP/1.1 200 OK</d:status>
  <d:href>/remote.php/dav/comments/files/113/4</d:href>
  <d:propstat>
   <d:prop>
    <oc:id>4</oc:id>
    <oc:parentId>0</oc:parentId>
    <oc:topmostParentId>0</oc:topmostParentId>
    <oc:childrenCount>0</oc:childrenCount>
    <oc:verb>comment</oc:verb>
    <oc:actorType>users</oc:actorType>
    <oc:actorId>admin</oc:actorId>
    <oc:creationDateTime>Tue, 12 Oct 2021 06:24:12 GMT</oc:creationDateTime>
    <oc:latestChildDateTime/>
    <oc:objectType>files</oc:objectType>
    <oc:objectId>113</oc:objectId>
    <oc:message>higher than &amp;gt;</oc:message>
    <oc:actorDisplayName>admin</oc:actorDisplayName>
    <oc:isUnread>false</oc:isUnread>
    <oc:mentions/>
   </d:prop>
   <d:status>HTTP/1.1 200 OK</d:status>
  </d:propstat>
 </d:response>
 <d:response>
  <d:status>HTTP/1.1 200 OK</d:status>
  <d:href>/remote.php/dav/comments/files/113/3</d:href>
  <d:propstat>
   <d:prop>
    <oc:id>3</oc:id>
    <oc:parentId>0</oc:parentId>
    <oc:topmostParentId>0</oc:topmostParentId>
    <oc:childrenCount>0</oc:childrenCount>
    <oc:verb>comment</oc:verb>
    <oc:actorType>users</oc:actorType>
    <oc:actorId>admin</oc:actorId>
    <oc:creationDateTime>Tue, 12 Oct 2021 06:24:07 GMT</oc:creationDateTime>
    <oc:latestChildDateTime/>
    <oc:objectType>files</oc:objectType>
    <oc:objectId>113</oc:objectId>
    <oc:message>lower than &amp;lt;</oc:message>
    <oc:actorDisplayName>admin</oc:actorDisplayName>
    <oc:isUnread>false</oc:isUnread>
    <oc:mentions/>
   </d:prop>
   <d:status>HTTP/1.1 200 OK</d:status>
  </d:propstat>
 </d:response>
 <d:response>
  <d:status>HTTP/1.1 200 OK</d:status>
  <d:href>/remote.php/dav/comments/files/113/2</d:href>
  <d:propstat>
   <d:prop>
    <oc:id>2</oc:id>
    <oc:parentId>0</oc:parentId>
    <oc:topmostParentId>0</oc:topmostParentId>
    <oc:childrenCount>0</oc:childrenCount>
    <oc:verb>comment</oc:verb>
    <oc:actorType>users</oc:actorType>
    <oc:actorId>admin</oc:actorId>
    <oc:creationDateTime>Tue, 12 Oct 2021 06:23:56 GMT</oc:creationDateTime>
    <oc:latestChildDateTime/>
    <oc:objectType>files</oc:objectType>
    <oc:objectId>113</oc:objectId>
    <oc:message>a single quote '</oc:message>
    <oc:actorDisplayName>admin</oc:actorDisplayName>
    <oc:isUnread>false</oc:isUnread>
    <oc:mentions/>
   </d:prop>
   <d:status>HTTP/1.1 200 OK</d:status>
  </d:propstat>
 </d:response>
 <d:response>
  <d:status>HTTP/1.1 200 OK</d:status>
  <d:href>/remote.php/dav/comments/files/113/1</d:href>
  <d:propstat>
   <d:prop>
    <oc:id>1</oc:id>
    <oc:parentId>0</oc:parentId>
    <oc:topmostParentId>0</oc:topmostParentId>
    <oc:childrenCount>0</oc:childrenCount>
    <oc:verb>comment</oc:verb>
    <oc:actorType>users</oc:actorType>
    <oc:actorId>admin</oc:actorId>
    <oc:creationDateTime>Tue, 12 Oct 2021 06:23:49 GMT</oc:creationDateTime>
    <oc:latestChildDateTime/>
    <oc:objectType>files</oc:objectType>
    <oc:objectId>113</oc:objectId>
    <oc:message>a double quote &quot;</oc:message>
    <oc:actorDisplayName>admin</oc:actorDisplayName>
    <oc:isUnread>false</oc:isUnread>
    <oc:mentions/>
   </d:prop>
   <d:status>HTTP/1.1 200 OK</d:status>
  </d:propstat>
 </d:response>
</d:multistatus>

PVince81 avatar Oct 12 '21 06:10 PVince81

hmm weird, for &quot; I understand, but why would the lower character be encoded as &amp;lt instead of &lt;.

@pytal I remember you spoke of double encoding, maybe it's a bug ?

PVince81 avatar Oct 12 '21 06:10 PVince81

hmm weird, for &quot; I understand, but why would the lower character be encoded as &amp;lt instead of &lt;.

@Pytal I remember you spoke of double encoding, maybe it's a bug ?

Yeah, discovered the strange occurrence while investigating https://github.com/nextcloud/server/issues/28243, seems to be a bug 🐞

Pytal avatar Oct 12 '21 22:10 Pytal

Thanks @PVince81 - I'd missed your comment on the Nextcloud issue. So it seems that the issue stems from the XML parser I'm using. What would you recommend?

Changing parser is a big deal.. and finding one as small and efficient as this one would be tricky I suspect (though I haven't looked recently). Happy to consider alternatives.

EDIT: Thanks for the XML too, I'll have a play with the parser options to see if a combination of he or another tool can help.

perry-mitchell avatar Oct 20 '21 20:10 perry-mitchell

I had a play and couldn't find any alternate configuration for the parser that seemed to work in every test case I have. I'd be happy if someone wanted to add a test case or MR that outlines a fix for this.

perry-mitchell avatar Jan 22 '22 10:01 perry-mitchell