WMCore icon indicating copy to clipboard operation
WMCore copied to clipboard

Explicitly compare local/remote stage out file checksum in GFAL2

Open amaltaro opened this issue 11 months ago • 1 comments

Impact of the bug WMAgent

Describe the bug As just discussed with Hasan, Panos and Stephan, there are hundreds of files that are having checksum issues and are unable to be transferred between storage endpoints.

Last year, or two years ago, we identified cases where gfal-copy was reporting status code 0, but in the end the data transfer faced issues and left corrupted files in the system, as tracked in https://github.com/dmwm/WMCore/issues/11556

We suspect that, at least a fraction of these recent failures, are still coming from that same scenario - especially because SL7 GFAL2 library hasn't been updated in the apptainer images used by SI for production workload.

How to reproduce it Do not know

Expected behavior The initial solution that we discussed in the meeting involves 3 steps:

  1. we execute the gfal-copy data transfer
  2. if it is successful, we trigger a gfal-sum (or the correct command to calculate the remote file checkum), regardless whether it gfal-copy was executed with the -K checksum option or not
  3. then in WMAgent, we compare the remote checksum with the local checksum. If it is equal, then the whole stage out step returns an exit code 0, else try to remove any corrupted/broken files with gfal-rm and follow the already in place stage out retry logic.

NOTE that we need to investigate which algorithm is used for the local checksum calculation (is it done by CMSSW? or WMAgent/WMRuntime is calculating) and make sure that the same method is used for the remote calculation.

In addition, it would be a bonus if we could add some flexibility to this check, such that we can enable/disable it. However, I think it is not simple enough and we might just skip it.

Additional context and error message None

amaltaro avatar Apr 25 '25 16:04 amaltaro

@amaltaro should we close this as not needed?

anpicci avatar Jul 08 '25 08:07 anpicci