dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Webdav failed uploads leave 0 length files in metadata

Open ageorget opened this issue 11 months ago • 9 comments
trafficstars

Hi,

I found many 0 length files in the namespace corresponding of failed uploads (3500 files for Atlas and CMS, 500 for LHCb) Files are well deleted from the pools but not from the namespace :

select * from t_inodes where itype=32768 and isize=0 and icrtime < CURRENT_TIMESTAMP-INTERVAL '24 hours' order by icrtime desc;
              ipnfsid                | itype | imode | inlink | iuid | igid | isize | iio |           ictime           |           iatime           |           imtime           |          icrtime           | igeneration | iaccess_latency | iretention_policy |  inumber   | iqos
_policy | iqos_state 
--------------------------------------+-------+-------+--------+------+------+-------+-----+----------------------------+----------------------------+----------------------------+----------------------------+-------------+-----------------+-------------------+------------+-----
--------+------------
 0000179591B66B7245B5A7737413750F4F47 | 32768 |   420 |      1 | 3033 |  119 |     0 |   2 | 2024-12-19 06:51:27.989+01 | 2024-12-19 06:51:27.989+01 | 2024-12-19 06:51:27.989+01 | 2024-12-19 06:51:27.989+01 |           0 |                 |                   | 1288503194 |          
 00005DA951F022A3419CAE8F2BE86FFC31B8 | 32768 |   420 |      1 | 3033 |  119 |     0 |   2 | 2024-12-19 06:46:03.511+01 | 2024-12-19 06:46:03.511+01 | 2024-12-19 06:46:03.511+01 | 2024-12-19 06:46:03.511+01 |           0 |                 |                   | 1288499582 |       
 0000843FC0B40E324342889C8B334B9C53F8 | 32768 |   420 |      1 | 3033 |  119 |     0 |   2 | 2024-12-19 06:45:47.725+01 | 2024-12-19 06:45:47.725+01 | 2024-12-19 06:45:47.725+01 | 2024-12-19 06:45:47.725+01 |           0 |                 |                   | 1288499339 |           
0000C648F112A32E4C9EB0F24E695181872D | 32768 |   420 |      1 | 3327 |  124 |     0 |   2 | 2024-12-18 14:21:26.529+01 | 2024-12-18 14:21:26.529+01 | 2024-12-18 14:21:26.529+01 | 2024-12-18 14:21:26.529+01 |           0 |                 |                   | 1288099252 |             |

Nothing left in t_location_trash, cleaner is working well.

Failed transfers from the pool logs :

Dec 19 06:51:28 ccdcatli344 dcache@ccdcatli344-pool-cms-hpssdata-li344a-Domain[65368]: 19 Dec 2024 06:51:28 (pool-cms-hpssdata-li344a) [door:webdav-ccdcatli345@webdav-ccdcatli345Domain:AAYpmR8NFLA webdav-ccdcatli345 PoolAcceptFile 0000179591B66B7245B5A7737413750F4F47] Transfer failed: Connection lost before end of file.
Dec 19 06:51:28 ccdcatli344 dcache@ccdcatli344-pool-cms-hpssdata-li344a-Domain[65368]: 19 Dec 2024 06:51:28 (pool-cms-hpssdata-li344a) [door:webdav-ccdcatli345@webdav-ccdcatli345Domain:AAYpmR8NFLA webdav-ccdcatli345 PoolAcceptFile 0000179591B66B7245B5A7737413750F4F47] Transfer failed in post-processing: File size mismatch (expected=497992534, actual=0)
Dec 19 06:51:28 ccdcatli344 dcache@ccdcatli344-pool-cms-hpssdata-li344a-Domain[65368]: 19 Dec 2024 06:51:28 (pool-cms-hpssdata-li344a) [door:webdav-ccdcatli345@webdav-ccdcatli345Domain:AAYpmR8NFLA webdav-ccdcatli345 PoolAcceptFile 0000179591B66B7245B5A7737413750F4F47] Failed to read file size: java.nio.file.NoSuchFileException: /data/pool-cms-hpssdata-li344a/pool/data/0000179591B66B7245B5A7737413750F4F47

Dec 19 06:46:06 ccdcatli415 dcache@ccdcatli415-pool-cms-hpssdata-li415a-Domain[104743]: 19 Dec 2024 06:46:06 (pool-cms-hpssdata-li415a) [door:webdav-ccdcatli345@webdav-ccdcatli345Domain:AAYpmQu1/Dg webdav-ccdcatli345 PoolAcceptFile 00005DA951F022A3419CAE8F2BE86FFC31B8] Transfer failed: Connection lost before end of file.
Dec 19 06:46:06 ccdcatli415 dcache@ccdcatli415-pool-cms-hpssdata-li415a-Domain[104743]: 19 Dec 2024 06:46:06 (pool-cms-hpssdata-li415a) [door:webdav-ccdcatli345@webdav-ccdcatli345Domain:AAYpmQu1/Dg webdav-ccdcatli345 PoolAcceptFile 00005DA951F022A3419CAE8F2BE86FFC31B8] Transfer failed in post-processing: File size mismatch (expected=424527680, actual=0)
Dec 19 06:46:06 ccdcatli415 dcache@ccdcatli415-pool-cms-hpssdata-li415a-Domain[104743]: 19 Dec 2024 06:46:06 (pool-cms-hpssdata-li415a) [door:webdav-ccdcatli345@webdav-ccdcatli345Domain:AAYpmQu1/Dg webdav-ccdcatli345 PoolAcceptFile 00005DA951F022A3419CAE8F2BE86FFC31B8] Failed to read file size: java.nio.file.NoSuchFileException: /data/pool-cms-hpssdata-li415a/pool/data/00005DA951F022A3419CAE8F2BE86FFC31B8

Dec 18 14:30:47 ccdcatli416 dcache@ccdcatli416-pool-atlas-dq2-li416a-Domain[100801]: 18 Dec 2024 14:30:47 (pool-atlas-dq2-li416a) [door:webdav-ccdcatli367@webdav-ccdcatli367Domain:AAYpi0py7lg webdav-ccdcatli367 PoolAcceptFile 0000C648F112A32E4C9EB0F24E695181872D] Transfer failed: No connection from client after 300 seconds. Giving up.
Dec 18 14:30:47 ccdcatli416 dcache@ccdcatli416-pool-atlas-dq2-li416a-Domain[100801]: 18 Dec 2024 14:30:47 (pool-atlas-dq2-li416a) [door:webdav-ccdcatli367@webdav-ccdcatli367Domain:AAYpi0py7lg webdav-ccdcatli367 PoolAcceptFile 0000C648F112A32E4C9EB0F24E695181872D] Transfer failed in post-processing: File size mismatch (expected=692252, actual=0)
Dec 18 14:30:47 ccdcatli416 dcache@ccdcatli416-pool-atlas-dq2-li416a-Domain[100801]: 18 Dec 2024 14:30:47 (pool-atlas-dq2-li416a) [door:webdav-ccdcatli367@webdav-ccdcatli367Domain:AAYpi0py7lg webdav-ccdcatli367 PoolAcceptFile 0000C648F112A32E4C9EB0F24E695181872D] Failed to read file size: java.nio.file.NoSuchFileException: /data/pool-atlas-dq2-li416a/pool/data/0000C648F112A32E4C9EB0F24E695181872D

ageorget avatar Dec 20 '24 14:12 ageorget