Adherence to rule grouping dependent on order of ATTACH
Motivation
CMS is seeing that rules which use "CONTAINER" grouping are not always grouped at the same RSE. In fact, we have rules made at different times on the same data and the two classes behave differently.
This is from @nsmith- .
If I look at the wmcore_output rules instead (using the rse expression (tier=2|tier=1)&cms_type=real&rse_type=DISK) I see only one out of 9000 rules is split: 12cc4f5e0dfb4809922498b474be16c3. I suspect a straggling file was added somehow after MSOutput made the rule, not sure if that's possible?
Since the US MiniAOD subscription-based rules are created as soon as the first file appears in the dataset, and the wmcore_output rules are only made once the dataset is complete (modulo the one exception) this means the culprit is probably in the daemon that updates rules when the DID is modified.
The relevant lines are https://github.com/rucio/rucio/blob/88984a4dbc9d8be4e254f61545c7066e6c67de56/lib/rucio/core/rule.py#L2617-L2629 Looking at this it seems if a dataset is attached to a container and then a file is attached to a dataset, then the code will consider previous files, but if the file is attached to the dataset and then the dataset is attached to the container, this will not run and preferred_rse_ids will be left empty. Will try to construct an example to test.
Modification
This could be related to (or is the root cause of) the observed problems, reported recently by CMS users, of datasets not getting automatically transferred to RSEs, when being added to already existing/subscribed containers covered by active replication rules.
The problem is reproducibly manifested in the following scenario:
- User creates a container with
rucio add-container - User attaches a (number of) datasets with
rucio attach - User creates a replication rule with
rucio add-rule --lifetime 2592000 --ask-approval ... RSE - Request is approved.
- Container gets replicated at the RSE
- User attaches additional datasets to the container with
rucio attach - New datasets do NOT get replicated at the RSE.
Expectation in last step is that the newly attached datasets automatically get replicated at the RSE.
@piperov In point #2 I presume you mean Containers, not datasets, in the Rucio terminology.
@ericvaandering Correct. I use 'Datasets' in the CMS sense here. Proper Rucio term would be 'Container'.
@piperov can you create a test-case for this please? In the end to me this just looks like https://github.com/rucio/rucio/blob/2930272ff394b28ed9f4d81e1b9f898d8f418951/lib/rucio/tests/test_judge_evaluator.py#L128 By reproducible, do you mean this happens every time or only some time?
After the CMS users reported the problem I repeated the procedure described above twice, with the exact same result. That's what I mean by reproducible. Of course all those tests were done within 2-3 days, so it's possible that there was some other, temporary reason for the failures.
I don't think Stefan's problem is the same as the one I originally described, because it is a failure to update the rule to reflect new contents where my problem is that the rule is updated but the locations of the resulting locks are not as desired based on the rule grouping setting.
Nick,
Have you verified that the rule gets updated ? Because if it does, I think that would contradict Stefan’s observations.
Igor
From: Nicholas Smith @.> Date: Friday, February 18, 2022 at 9:44 AM To: rucio/rucio @.> Cc: Igor V Mandrichenko @.>, Assign @.> Subject: Re: [rucio/rucio] Adherence to rule grouping dependent on order of ATTACH (Issue #5251)
I don't think Stefan's problem is the same as the one I originally described, because it is a failure to update the rule to reflect new contents where my problem is that the rule is updated but the locations of the resulting locks are not as desired based on the rule grouping setting.
— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rucio_rucio_issues_5251-23issuecomment-2D1044634322&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=xVVABFB8tmUPsqeRvA-B6A&m=fZ9SXlxNt9s9Gq_GsM7cJbJLzWv5GTAkIgbghMXYXmotdOzo2cADo9RELeXACpgs&s=7HPwr_VafigCHCm95X6aC5DCX8_UBhFsbD-gSb7RTgM&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AFK4SQTZOCGP3L5UOII7KY3U3ZLNHANCNFSM5OVDGSVA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=xVVABFB8tmUPsqeRvA-B6A&m=fZ9SXlxNt9s9Gq_GsM7cJbJLzWv5GTAkIgbghMXYXmotdOzo2cADo9RELeXACpgs&s=GECskv13kBan5QJaCM5gwPVRSoYxe6biDv7BAJ77kpU&e=. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=xVVABFB8tmUPsqeRvA-B6A&m=fZ9SXlxNt9s9Gq_GsM7cJbJLzWv5GTAkIgbghMXYXmotdOzo2cADo9RELeXACpgs&s=UPlbUkMoA5DudpXpHKFJY-SpdD8g7_-_o369Upr9Pcw&e= or Androidhttps://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=xVVABFB8tmUPsqeRvA-B6A&m=fZ9SXlxNt9s9Gq_GsM7cJbJLzWv5GTAkIgbghMXYXmotdOzo2cADo9RELeXACpgs&s=Ni5JbMUbt5gB0lEwFlWUirZ9k3WFrtAb63tpWSECT64&e=. You are receiving this because you were assigned.Message ID: @.***>
I always encourage CMS people in the know to talk about blocks and containers and not mention datasets. :-)
On Feb 18, 2022, at 8:02 AM, Stefan Piperov @.***> wrote:
@ericvaandering https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ericvaandering&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=d-x7lzM-W62_J2uRftAyLzMD_XyMqGjSq3deOUT-WcY&e= Correct. I use 'Datasets' in the CMS sense here. Proper Rucio term would be 'Container'.
— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rucio_rucio_issues_5251-23issuecomment-2D1044575603&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=JESGRSdIbH4crHECVS0epelUFAsMl2-gqX5dzY7CVhc&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAMYJLSAFKPY5DVMNZVLLSDU3ZGQVANCNFSM5OVDGSVA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=_awj7U3AF-u8TDenUzI1wUAVenDKop9eMu-XLniWVWw&e=. Triage notifications on the go with GitHub Mobile for iOS https://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=Cv3Y-AXAOATf35kutLeNa4ZuawlNmhkVj3r23tVPQOs&e= or Android https://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=P-ZQcVYOcDKPO4YA984PnB2PLvn_jCpLWuQFfDzyrn4vgd1m1gSeasQmwt1WkQ7-&s=rpqpBmxj1OGOvFw4hMWN33XOox4-oE4XYlrGcSN09xA&e=. You are receiving this because you were mentioned.
I am sure the rules are updated, not least because the updated timestamp is changed. But my rules are different: dataset DIDs are being attached to container DIDs, whereas in Stefan's case container DIDs are attached to container DIDs. Perhaps that's the difference.
I have created a test case for this. See https://github.com/rucio/rucio/pull/5272
#5272 was closed, what's the progress on this?