dcache
dcache copied to clipboard
Truncated files
Continuing issues of bad files.
User report:
Error in <TFile::Init>: file /pnfs/GM2/scratch/daq/2021-10-29-18-14-40/data/gm2preproduction_full_49855919_44135.00292.root is truncated at 700743060 bytes: should be 1242600121, trying to recover
Warning in <TFile::Init>: no keys recovered, file has been made a Zombie
Unable to open file '/pnfs/GM2/scratch/daq/2021-10-29-18-14-40/data/gm2preproduction_full_49855919_44135.00292.root' for reading.
Skipping file.
The file is not in Error state:
[fndca3b] (PnfsManager@namespaceDomain) enstore > pnfsidof /pnfs/fs/usr/GM2/scratch/daq/2021-10-29-18-14-40/data/gm2preproduction_full_49855919_44135.00292.root'
000093273C9C8B724B9CB3F12CB15F14D6B0
[fndca3b] (PnfsManager@namespaceDomain) enstore > \sl 000093273C9C8B724B9CB3F12CB15F14D6B0 rep ls 000093273C9C8B724B9CB3F12CB15F14D6B0
v-stkendca2003-2:
000093273C9C8B724B9CB3F12CB15F14D6B0 <C----------L(0)[0]> 700743060 si={GM2.scratch}
But I see upload error in billing:
billing=# select datestamp, protocol, errorcode, errormessage, initiator from billinginfo where pnfsid = '000093273C9C8B724B9CB3F12CB15F14D6B0' and isnew is true;
datestamp | protocol | errorcode | errormessage
| initiator
----------------------------+----------+-----------+--------------------------------------------------------------------------
-------------+--------------------------------------------------------------------------------
2021-10-29 20:51:08.827-05 | GFtp-2.0 | 666 | General problem: Problem while connected to 137.99.174.35:56498: Connecti
on timed out | door:GFTP-stkendca2043-AAXPhlXymbg@gridftp-stkendca2043Domain:1635550758461000
(1 row)
*AND* interestingly I do not see record associated with `door:GFTP-stkendca2043-AAXPhlXymbg@gridftp-stkendca2043Domain:1635550758461000` in doorinfo.
Houston, we have a problem.
I'd be interested what was the client interactions for this transfer.
Could you copy the corresponding lines from the access log file for this FTP session? (something like grep 1635550758461000 /var/log/dcache/gridftp-stkendca2043Domain.access
).
My guess is that the client didn't provide any hint about the expected file size (or checksum).
It looks like the FTP mover knew there was a problem ("Connection timed out"), but the protocol-agnostic post-transfer handler doesn't know any better, so considers the replica "good" and the transfer as successful.
The missing door billing entry is also interesting: is anything logged by the door at around the time the transfer finished?