rucio icon indicating copy to clipboard operation
rucio copied to clipboard

Adherence to rule grouping dependent on order of ATTACH

Open ericvaandering opened this issue 4 years ago • 11 comments

Motivation

CMS is seeing that rules which use "CONTAINER" grouping are not always grouped at the same RSE. In fact, we have rules made at different times on the same data and the two classes behave differently.

This is from @nsmith- .

If I look at the wmcore_output rules instead (using the rse expression (tier=2|tier=1)&cms_type=real&rse_type=DISK) I see only one out of 9000 rules is split: 12cc4f5e0dfb4809922498b474be16c3. I suspect a straggling file was added somehow after MSOutput made the rule, not sure if that's possible?

Since the US MiniAOD subscription-based rules are created as soon as the first file appears in the dataset, and the wmcore_output rules are only made once the dataset is complete (modulo the one exception) this means the culprit is probably in the daemon that updates rules when the DID is modified.

The relevant lines are https://github.com/rucio/rucio/blob/88984a4dbc9d8be4e254f61545c7066e6c67de56/lib/rucio/core/rule.py#L2617-L2629 Looking at this it seems if a dataset is attached to a container and then a file is attached to a dataset, then the code will consider previous files, but if the file is attached to the dataset and then the dataset is attached to the container, this will not run and preferred_rse_ids will be left empty. Will try to construct an example to test.

Modification

ericvaandering avatar Feb 17 '22 15:02 ericvaandering

This could be related to (or is the root cause of) the observed problems, reported recently by CMS users, of datasets not getting automatically transferred to RSEs, when being added to already existing/subscribed containers covered by active replication rules.

The problem is reproducibly manifested in the following scenario:

  1. User creates a container with rucio add-container
  2. User attaches a (number of) datasets with rucio attach
  3. User creates a replication rule with rucio add-rule --lifetime 2592000 --ask-approval ... RSE
  4. Request is approved.
  5. Container gets replicated at the RSE
  6. User attaches additional datasets to the container with rucio attach
  7. New datasets do NOT get replicated at the RSE.

Expectation in last step is that the newly attached datasets automatically get replicated at the RSE.

piperov avatar Feb 17 '22 16:02 piperov

@piperov In point #2 I presume you mean Containers, not datasets, in the Rucio terminology.

ericvaandering avatar Feb 17 '22 20:02 ericvaandering

@ericvaandering Correct. I use 'Datasets' in the CMS sense here. Proper Rucio term would be 'Container'.

piperov avatar Feb 18 '22 14:02 piperov

@piperov can you create a test-case for this please? In the end to me this just looks like https://github.com/rucio/rucio/blob/2930272ff394b28ed9f4d81e1b9f898d8f418951/lib/rucio/tests/test_judge_evaluator.py#L128 By reproducible, do you mean this happens every time or only some time?

bari12 avatar Feb 18 '22 14:02 bari12

After the CMS users reported the problem I repeated the procedure described above twice, with the exact same result. That's what I mean by reproducible. Of course all those tests were done within 2-3 days, so it's possible that there was some other, temporary reason for the failures.

piperov avatar Feb 18 '22 14:02 piperov

I don't think Stefan's problem is the same as the one I originally described, because it is a failure to update the rule to reflect new contents where my problem is that the rule is updated but the locations of the resulting locks are not as desired based on the rule grouping setting.

nsmith- avatar Feb 18 '22 14:02 nsmith-

Nick,

Have you verified that the rule gets updated ? Because if it does, I think that would contradict Stefan’s observations.

Igor

From: Nicholas Smith @.> Date: Friday, February 18, 2022 at 9:44 AM To: rucio/rucio @.> Cc: Igor V Mandrichenko @.>, Assign @.> Subject: Re: [rucio/rucio] Adherence to rule grouping dependent on order of ATTACH (Issue #5251)

I don't think Stefan's problem is the same as the one I originally described, because it is a failure to update the rule to reflect new contents where my problem is that the rule is updated but the locations of the resulting locks are not as desired based on the rule grouping setting.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rucio_rucio_issues_5251-23issuecomment-2D1044634322&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=xVVABFB8tmUPsqeRvA-B6A&m=fZ9SXlxNt9s9Gq_GsM7cJbJLzWv5GTAkIgbghMXYXmotdOzo2cADo9RELeXACpgs&s=7HPwr_VafigCHCm95X6aC5DCX8_UBhFsbD-gSb7RTgM&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AFK4SQTZOCGP3L5UOII7KY3U3ZLNHANCNFSM5OVDGSVA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=xVVABFB8tmUPsqeRvA-B6A&m=fZ9SXlxNt9s9Gq_GsM7cJbJLzWv5GTAkIgbghMXYXmotdOzo2cADo9RELeXACpgs&s=GECskv13kBan5QJaCM5gwPVRSoYxe6biDv7BAJ77kpU&e=. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=xVVABFB8tmUPsqeRvA-B6A&m=fZ9SXlxNt9s9Gq_GsM7cJbJLzWv5GTAkIgbghMXYXmotdOzo2cADo9RELeXACpgs&s=UPlbUkMoA5DudpXpHKFJY-SpdD8g7_-_o369Upr9Pcw&e= or Androidhttps://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=xVVABFB8tmUPsqeRvA-B6A&m=fZ9SXlxNt9s9Gq_GsM7cJbJLzWv5GTAkIgbghMXYXmotdOzo2cADo9RELeXACpgs&s=Ni5JbMUbt5gB0lEwFlWUirZ9k3WFrtAb63tpWSECT64&e=. You are receiving this because you were assigned.Message ID: @.***>

ivmfnal avatar Feb 18 '22 14:02 ivmfnal

I always encourage CMS people in the know to talk about blocks and containers and not mention datasets. :-)

On Feb 18, 2022, at 8:02 AM, Stefan Piperov @.***> wrote:

@ericvaandering https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ericvaandering&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=d-x7lzM-W62_J2uRftAyLzMD_XyMqGjSq3deOUT-WcY&e= Correct. I use 'Datasets' in the CMS sense here. Proper Rucio term would be 'Container'.

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rucio_rucio_issues_5251-23issuecomment-2D1044575603&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=JESGRSdIbH4crHECVS0epelUFAsMl2-gqX5dzY7CVhc&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAMYJLSAFKPY5DVMNZVLLSDU3ZGQVANCNFSM5OVDGSVA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=_awj7U3AF-u8TDenUzI1wUAVenDKop9eMu-XLniWVWw&e=. Triage notifications on the go with GitHub Mobile for iOS https://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=Cv3Y-AXAOATf35kutLeNa4ZuawlNmhkVj3r23tVPQOs&e= or Android https://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=rpqpBmxj1OGOvFw4hMWN33XOox4-oE4XYlrGcSN09xA&e=. You are receiving this because you were mentioned.

ericvaandering avatar Feb 18 '22 15:02 ericvaandering

I am sure the rules are updated, not least because the updated timestamp is changed. But my rules are different: dataset DIDs are being attached to container DIDs, whereas in Stefan's case container DIDs are attached to container DIDs. Perhaps that's the difference.

nsmith- avatar Feb 18 '22 15:02 nsmith-

I have created a test case for this. See https://github.com/rucio/rucio/pull/5272

ivmfnal avatar Feb 22 '22 16:02 ivmfnal

#5272 was closed, what's the progress on this?

nsmith- avatar Apr 19 '22 20:04 nsmith-