WebDAV upload to redirecting door with unmatching Content-MD5 leaves corrupt file behind
Dear dCache devs,
When I upload a file with curl with these conditions:
- To a redirecting WebDAV door
- With a Content-MD5 checksum to be verified
- Where the Content-MD5 checksum doesn't match the file
The operation leaves a corrupt file (whose checksum didn't match) behind.
When I do this on a non-redirecting WebDAV door, there is no garbage left behind; the uploaded file with the incorrect checksum is cleaned up.
Here's the command I run to reproduce it. I upload a file dcache-10.2.10-1.noarch.rpm, but with the checksum of /bin/bash. This should fail, and indeed a "Checksum mismatch" is returned, but the uploaded file remains.
% file=~/Downloads/dcache-10.2.10-1.noarch.rpm
% curl -H "Authorization: Bearer $BEARER_TOKEN" -H "Content-MD5: $(md5sum /bin/bash | cut -d' ' -f1 | xxd -r -p | base64)" -X PUT https://webdav.grid.surfsara.nl:2882/pnfs/grid.sara.nl/data/users/onno/disk/testfile-md5-incorrect-2882 -L --post302 -T $file -v
Note: Unnecessary use of -X or --request, PUT is already inferred.
* Host webdav.grid.surfsara.nl:2882 was resolved.
* IPv6: 2001:610:108:203a::2:78, 2001:610:108:203a::2:79, 2001:610:108:203a::2:52, 2001:610:108:203a::2:53, 2001:610:108:203a::2:65, 2001:610:108:203a::2:66
* IPv4: 145.100.34.78, 145.100.34.79, 145.100.34.66, 145.100.34.53, 145.100.34.65, 145.100.34.52
* Trying [2001:610:108:203a::2:78]:2882...
* Connected to webdav.grid.surfsara.nl (2001:610:108:203a::2:78) port 2882
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* CAfile: /etc/ssl/cert.pem
* CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Request CERT (13):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Certificate (11):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384 / [blank] / UNDEF
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
* subject: DC=org; DC=terena; DC=tcs; C=NL; ST=Utrecht; O=SURF B.V.; CN=marten12.grid.surfsara.nl
* start date: Dec 11 00:00:00 2024 GMT
* expire date: Jan 10 23:59:59 2026 GMT
* subjectAltName: host "webdav.grid.surfsara.nl" matched cert's "webdav.grid.surfsara.nl"
* issuer: C=NL; O=GEANT Vereniging; CN=GEANT eScience SSL CA 4
* SSL certificate verify ok.
* using HTTP/1.x
> PUT /pnfs/grid.sara.nl/data/users/onno/disk/testfile-md5-incorrect-2882 HTTP/1.1
> Host: webdav.grid.surfsara.nl:2882
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer ****************************************
> Content-MD5: 7iGzTcQLQDWQVxqKWwsvtA==
> Content-Length: 182145541
> Expect: 100-continue
>
< HTTP/1.1 307 Temporary Redirect
< Date: Mon, 18 Aug 2025 15:39:41 GMT
< Server: dCache/10.2.10
* close instead of sending 182145541 more bytes
< Location: http://[2001:610:108:203a:0:0:2:50]:23281/pnfs/grid.sara.nl/data/users/onno/disk/testfile-md5-incorrect-2882?dcache-http-uuid=f7e703e2-69f2-46c0-98c9-c6c97ee50536&dcache-http-ref=https%3A%2F%2Fwebdav.grid.surfsara.nl%3A2882
< Connection: close
<
* Closing connection
* Clear auth, redirects to port from 2882 to 23281
* Issue another request to this URL: 'http://[2001:610:108:203a::2:50]:23281/pnfs/grid.sara.nl/data/users/onno/disk/testfile-md5-incorrect-2882?dcache-http-uuid=f7e703e2-69f2-46c0-98c9-c6c97ee50536&dcache-http-ref=https%3A%2F%2Fwebdav.grid.surfsara.nl%3A2882'
* Trying [2001:610:108:203a::2:50]:23281...
* Connected to 2001:610:108:203a::2:50 (2001:610:108:203a::2:50) port 23281
> PUT /pnfs/grid.sara.nl/data/users/onno/disk/testfile-md5-incorrect-2882?dcache-http-uuid=f7e703e2-69f2-46c0-98c9-c6c97ee50536&dcache-http-ref=https%3A%2F%2Fwebdav.grid.surfsara.nl%3A2882 HTTP/1.1
> Host: [2001:610:108:203a::2:50]:23281
> User-Agent: curl/8.7.1
> Accept: */*
> Content-MD5: 7iGzTcQLQDWQVxqKWwsvtA==
> Content-Length: 182145541
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
< Server: dCache/10.2.10
<
* upload completely sent off: 182145541 bytes
< HTTP/1.1 400 Checksum mismatch (expected=[2:ee21b34dc40b403590571a8a5b0b2fb4], actual=[2:deefbefaeb6c64bc40f0521e73e57aea, 1:42a9157c])
< Content-Length: 0
< Server: dCache/10.2.10
<
* Connection #1 to host 2001:610:108:203a::2:50 left intact
Documentation: https://dcache.org/manuals/UserGuide-10.2/webdav.shtml#checksums (under "Failing upload if data is corrupt")
Is my assumption correct that dCache should remove the file whose checksum is incorrect? Is this a bug?
Cheers, Onno
Some comments:
- I believe corrupted data most often comes from an incomplete upload, but there are other causes (e.g., inconsistency between file storage and a catalogue, silent corruption, damaged hardware).
- In the case of corrupt data, the file may still have value; e.g., if only the final byte is missing, it may be possible to correct for this with limited (or no) impact on the goal.
- In dCache, a design decision is that the pool never deletes an uploaded file. This allows flexibility in how dCache reacts to failed uploads.
- Similarly, in dCache, the door is responsible for deleting the failed upload, if that is correct behaviour.
- Inconsistency between redirected and non-redirected WebDAV corrupted uploads is (to my mind) certainly a bug.
- From memory, there is no well-documented "correct" behaviour for WebDAV when uploading a file that is corrupt.
- Similarly true for incomplete uploads: an incomplete upload should be detectable when using
Content-Lengthheader or when using chunked encoding on the HTTP PUT request.
Hi Paul,
Thanks a lot for your comments.
Indeed it looks like a bug that redirecting and non-redirecting doors behave differently.
I'm a bit confused by your remark that "In the case of corrupt data, the file may still have value". I thought that the idea of Content-MD5 was, to make sure you never have corrupt data. If I want to keep partial uploads, I can simply not use Content-MD5 and after upload check the checksum.
My expectations are partly based on the documentation at https://dcache.org/manuals/UserGuide-10.2/webdav.shtml#checksums , at the "Failing upload if data is corrupt" chapter.
However, if the file is discovered to be corrupt, the client is then responsible for either removing the corrupt file or attempting another upload. Until either is done, the file exists in dCache with corrupt data.
Placing this responsibility on the client may be problematic: the client could halt (or be interrupted) before the recovery procedure completes, or may be authorised only to upload data and not overwrite existing data nor delete existing data.
We actually have users who don't have delete permissions at their own request, because they are afraid to lose data accidentally. So sometimes they ask us to clean up some failed uploads. That's one reason why this option seems attractive.
OK, hopefully to clarify ...
The "In the case of corrupt data, the file may still have value" comment was meant as a general statement. One might have transferred some 100 TiB file over a slow link only for that transfer to fail after 99% of the file being copied. Despite the file being incomplete, it could be that dCache has enough for the data analysis you planned, in which case deleting the data and starting again would be sub-optimal. This observation is (I would say) independent of whether the PUT request contains a Content-MD5 header. This observation is also independent of the behaviour dCache supports (or might support). In any case, I don't think this is terribly important as there are other approaches to support such scenarios, such as TUS.
The user manual text you quoted is describing the situation when the user first uploads the data and then subsequently verifies the uploaded file's integrity. The concern it raises (I believe) matches your use-case, where the client uploading the data isn't authorised to delete the data, so requires some out-of-bound way to clean up any failed transfers.
I would say the more relevant part of the documentation is:
An alternative approach is to supply a known checksum value when uploading the data. dCache then verifies this known checksum value matches that of the data it receives. If the two checksums do not match then the upload fails.
Taking this as accurately describing the desired behaviour then the current behaviour is just broken. The code-path for redirecting the client to the pool handles data corruption incorrectly. The WebDAV door should simply delete the file if the transfer fails, but currently it doesn't do so.