dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Truncated files

Open DmitryLitvintsev opened this issue 3 years ago • 1 comments

Continuing issues of bad files.

User report:

Error in <TFile::Init>: file /pnfs/GM2/scratch/daq/2021-10-29-18-14-40/data/gm2preproduction_full_49855919_44135.00292.root is truncated at 700743060 bytes: should be 1242600121, trying to recover
Warning in <TFile::Init>: no keys recovered, file has been made a Zombie
Unable to open file '/pnfs/GM2/scratch/daq/2021-10-29-18-14-40/data/gm2preproduction_full_49855919_44135.00292.root' for reading.
Skipping file.

The file is not in Error state:

[fndca3b] (PnfsManager@namespaceDomain) enstore > pnfsidof /pnfs/fs/usr/GM2/scratch/daq/2021-10-29-18-14-40/data/gm2preproduction_full_49855919_44135.00292.root'
000093273C9C8B724B9CB3F12CB15F14D6B0
[fndca3b] (PnfsManager@namespaceDomain) enstore > \sl 000093273C9C8B724B9CB3F12CB15F14D6B0 rep ls 000093273C9C8B724B9CB3F12CB15F14D6B0
v-stkendca2003-2:
    000093273C9C8B724B9CB3F12CB15F14D6B0 <C----------L(0)[0]> 700743060 si={GM2.scratch}

But I see upload error in billing:

billing=# select datestamp, protocol, errorcode, errormessage, initiator from billinginfo where pnfsid = '000093273C9C8B724B9CB3F12CB15F14D6B0' and isnew is true;
         datestamp          | protocol | errorcode |                                     errormessage                         
             |                                   initiator                                    
----------------------------+----------+-----------+--------------------------------------------------------------------------
-------------+--------------------------------------------------------------------------------
 2021-10-29 20:51:08.827-05 | GFtp-2.0 |       666 | General problem: Problem while connected to 137.99.174.35:56498: Connecti
on timed out | door:GFTP-stkendca2043-AAXPhlXymbg@gridftp-stkendca2043Domain:1635550758461000
(1 row)

*AND* interestingly I do not see record associated with `door:GFTP-stkendca2043-AAXPhlXymbg@gridftp-stkendca2043Domain:1635550758461000` in doorinfo. 

Houston, we have a problem.

DmitryLitvintsev avatar Feb 08 '22 18:02 DmitryLitvintsev

I'd be interested what was the client interactions for this transfer.

Could you copy the corresponding lines from the access log file for this FTP session? (something like grep 1635550758461000 /var/log/dcache/gridftp-stkendca2043Domain.access).

My guess is that the client didn't provide any hint about the expected file size (or checksum).

It looks like the FTP mover knew there was a problem ("Connection timed out"), but the protocol-agnostic post-transfer handler doesn't know any better, so considers the replica "good" and the transfer as successful.

The missing door billing entry is also interesting: is anything logged by the door at around the time the transfer finished?

paulmillar avatar Feb 08 '22 22:02 paulmillar