Add new data catalogs from Rucio storage description (RucioCatalog) and use it by default instead of trivial data catalogs (TrivialCatalog)
PR description:
New data access using catalogs (defined in
PR validation:
if this PR is a backport please specify the original PR and why you need to backport that PR:
Before submitting your pull requests, make sure you followed this checklist:
- verify that the PR is really intended for the chosen branch
- verify that changes follow CMS Naming, Coding, And Style Rules
- verify that the PR passes the basic test procedure suggested in the CMSSW PR instructions
-code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905
- This PR adds an extra 32KB to repository
Code check has found code style and quality issues which could be resolved by applying following patch(s)
-
code-checks: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905/code-checks.patch e.g.
curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905/code-checks.patch | patch -p1You can also runscram build code-checksto apply code checks directly -
code-format: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905/code-format.patch e.g.
curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905/code-format.patch | patch -p1You can also runscram build code-formatto apply code format directly
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28906
- This PR adds an extra 36KB to repository
A new Pull Request was created by @nhduongvn for master.
It involves the following packages:
- FWCore/Catalog (core)
- FWCore/Services (core)
@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks. @makortel, @wddgit this is something you requested to watch as well. @perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here
What is the motivation for this change?
Hi Chris, The changes are needed for transition to a new storage description at sites required by Rucio (and then we switched from XML to the more modern json format) too. So, this is effectively completing the Rucio transition. Best
Hallo Matti,
Is there any need to backport this to earlier release cycles?
yes, it would be desirable to port this to the latest versions of all CMSSW_X releases so storage.xml can be phased out (at some point).
How long does the reading of storage.xml need to be supported? E.g. do all the sites already provide storage.json?
All sites have a storage.json file. The site support team is adding the two new site-local-config.xml sections to the remaining sites this week.
Thanks,
- Stephan
Thanks Stephan. That sounds to me like there would be no need to keep the storage.xml reading support (or, maybe keep the code for a while but change the default to storage.json).
Hallo Matti,
We also came up with an additional question: is it intentional that the path to storage.json is not configurable within the site-local-config.xml file (like the path to storage.xml is currently)?
Yes, this is an intentional change: A site's storage.json and site-local-config.xml might be at different locations for different worker nodes/subsites. This way only the SITECONF directory structure is "fixed" with one entry point, SITECONFIG_PATH.
Thanks,
- Stephan
I have received your comments and will work them out one by one.
Hi @makortel @stlammel I will try to finish updating the codes today or tomorrow. I got "segmentation fault" when running tests last week and could not track down where. I have to do implementation step by step and run test along. Hopefully, I can resolve this, if not I will report back.
I have gone slowly to modify codes from Mati suggestions and done test along, so far no "segmentation fault" (I might did something wrong in the past when modifying all suggestions at once and did the test at the end). I will continue tomorrow.
Hi @makortel, I made a new branch "test_struct_dataCatalog" to test storing data catalog as a struct. The struct is defined here: https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L35 I also made two constructors for "FileLocator" https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/FileLocator.h#L21 https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/FileLocator.h#L22 However, when I compiled there are two errors in [1]. I am not sure how to overcome this currently. This branch contains all modifications you have suggested so if these are solved I will merge it with the branch used in the current pull request and the review can be restarted again. As you remember, I said that I saw "crashes" when using struct or adding a new function to SiteLocalConfig.h (FWCore/Catalog/interface/SiteLocalConfig.h). I did not see them now. This could be that I implemented things in bulk and tested at the end and there were bugs somewhere that caused crashes (It could be that I used older CMSSW version). This time I carefully implement things step-by-step and doing test along. Anyway, these are not problems anymore and we can move on. Please help me to solve these compilation errors. I have other question: where the struct can be optimally located? in SiteLocalConfig.h? Thank you, Duong
====================================== [1] /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:77:27: error: expected constructor, destructor, or type conversion before '(' token 77 | FileLocator::FileLocator(edm::SiteLocalConfig::CatalogAtrr const& catAttr, unsigned iCatalog) | ^ /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc: In member function 'void edm::FileLocator::init_da(const edm::SiteLocalConfig::CatalogAttr&, unsigned int)': /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:242:33: error: passing 'const edm::SiteLocalConfig::CatalogAttr' as 'this' argument discards qualifiers [-fpermissive] 242 | if (input_dataCatalog.empty()) { | ^ In file included from /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/interface/FileLocator.h:4, from /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:1: /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/interface/SiteLocalConfig.h:37:18: note: in call to 'const bool edm::SiteLocalConfig::CatalogAttr::empty()' 37 | bool const empty(){return site==""&&subSite==""&&storageSite==""&&volume==""&&protocol=="";}
Thanks @nhduongvn. The compilation error comes from edm::SiteLocalConfig::CatalogAttr::empty() not being const. Basically this line
https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L37
should be changed to
bool empty() const {return site==""&&subSite==""&&storageSite==""&&volume==""&&protocol=="";}
(plus formatting). I'd actually prefer to use the std::string::empty(), i.e.
bool empty() const {return site.empty() and subSite.empty() and storageSite.empty() and volume.empty() and protocol.empty();}
The SiteLocalConfig.h as a place of the struct is ok. I would, however, move it outside of SiteLocalConfig class, i.e. directly to edm namespace, and spell out the name fully, i.e. CatalogAttributes. Note that the constructor on line
https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L36
could be just CatalogAttr() = default; or even left out (I think).
I would also make the enum CatalogType as enum class CatalogType, and move directly to edm namespace (assuming the enumeration is still needed, I've forgotten already that level of detail).
Thank you. The errors were fixed.
-code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29278
- This PR adds an extra 216KB to repository
Code check has found code style and quality issues which could be resolved by applying following patch(s)
-
code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29278/code-format.patch
e.g.
curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29278/code-format.patch | patch -p1You can also runscram build code-formatto apply code format directly
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29279
- This PR adds an extra 92KB to repository
Pull request #37278 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.
There are still a few Tier-2 sites without the sections but those should hopefully be all done within the next days/week.
- Stephan
> Thanks @nhduongvn for the update. My comments on the code are mostly fine tuning.
>
> I still have one bigger question in my mind about deployment of the Rucio Catalog code. If I understood @stlammel's #37278 (comment) correctly, all sites would have storage.json by now. Should we just switch that as the default in this PR? (and later on remove the Trivial Catalog code when it is no longer deemed necessary)
Maybe this can be switched on in the next release (CMSSW_12_3_1?)
> Unless I missed something, I didn't see any tests for the Rucio Catalog. Could you add tests using storage.json, preferably to the same extent as we have for the Trivial Catalog? Thanks!
For trivial catalog, I do not see test for storage.xml. As far as I know there are only tests for SiteLocalConfigService in FWCore/Services/test/test_catch2_SiteLocalConfigService.cc, which check the "dummy" configuration in FWCore/Services/test/*.testfile. Actually, I am not very sure why we need to run these tests and when and where these tests are called/used. Please note that in the FWCore/Services/test/test_catch2_SiteLocalConfigService.cc, there are already equivalent checks for Rucio catalog, for example: CHECK(slc.dataCatalogs()[0] == "trivialcatalog_file:/dummy/storage.xml?protocol=dcap"); edm::CatalogAttributes tmp("DUMMY", "DUMMY_SUB_SITE", "DUMMY_CROSS_SITE", "CMSXrootdFederation", "XRootD"); CHECK(slc.dataCatalogs_da()[0] == tmp);
@nhduongvn please note that for CMSSW Integration build and PR tests we use a copy of T2_CH_CERN siteconf which is available under /cvmfs/cms-ib.cern.ch/SITECONF . Note the extra mapping [a] in /cvmfs/cms-ib.cern.ch/SITECONF/local/PhEDEx/storage.xml which allow us to read cached root files under /eos/cms/store/user/cmsbuild/store for IBs/PRs test.
Also we use cms-xrd-global global redirector [b] as fallback instead of default xrootd-cms.infn.it. Do we need to change something in new storage.json too?
[a]
<lfn-to-pfn protocol="ibeos"
path-match="/+store/(.*)"
result="root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/$1"/>
[b]
<!-- Xrootd fallback rules -->
<lfn-to-pfn protocol="xrootd" destination-match=".*"
path-match="/+store/(.*)"
result="root://cms-xrd-global.cern.ch//store/$1"/>
Hi @smuzaffar ,
I took a look at test settings you are mentioning (/cvmfs/cms-ib.cern.ch/SITECONF). It looks like the settings are OK. The site-local-config.xml and storage.json are there with consistent definitions for data access:
/cvmfs/cms-ib.cern.ch/SITECONF/local/JobConfig/site-local-config.xml
<data-access>
<catalog volume="CERN_EOS" protocol="XRootD"/>
<catalog site="T1_IT_CNAF" volume="Eurasian_Federation" protocol="XRootD"/>
</data-access>
I can see matched storage description for the first data catalog (
@nhduongvn @smuzaffar
Hallo Duong, Shahzad,
so, as Duong points out, you would need a copy of the T1_IT_CNAF in /cvmfs/cms-ib.cern.ch/SITECONF or T0_CH_CERN since you want to use the global redirector and not Eurasion redirector. The
<catalog volume="CERN_EOS" protocol="XRootD"/>
<catalog site="T0_CH_CERN" volume="CMSXrootdFederation" protocol="XRootD"/>
Now, storage.json needs to be adjusted from the way you have it currently: The prefix needs to be changed from "/eos/cms" to "/eos/cms/store/user/cmsbuild" since files are in /eos/cms/store/user/cmsbuild/store/... and not in /eos/cms/store/... Since this is a "fake" SITECONF, you could add a subsite in there, i.e. make a /cvmfs/cms-ib.cern.ch/SITECONF/T2_CH_CERN/TEST_SUBSITE/JobConfig/site-local-config.xml file with a subsite tag (everything else as the site entry) and then point SITECONFIG_PATH to /cvmfs/cms-ib.cern.ch/SITECONF/T2_CH_CERN/TEST_SUBSITE instead of /cvmfs/cms-ib.cern.ch/SITECONF/T2_CH_CERN .
There is also the option of keeping this all within the standard setup, i.e. not making a separate SITECONF on /cvmfs/cms-ib.cern.ch, if you want. For this, you would add a new "volume" to the T2_CH_CERN storage.json, add a test subsite to the T2_CH_CERN SITECONF with the site-local-config.xml that refers to the new volume, and in the testing just point SITECONFIG_PATH to that new subsite. Thanks,
- Stephan
Dear @stlammel @nhduongvn,
I am not able to digest the two comments above :-) can you please suggest the required changes for https://github.com/cms-sw/siteconf/tree/master/local . Note that the SITECONF should work for all CMSSW release cycles (e.g. SITECONFIG_PATH is not supported in old cmssw releases). All we want is to use /eos/cms/store/user/cmsbuild/store/ store and global rediector as fallback.
Thanks,
Hallo @smuzaffar , the subsite test would apply to only the CMSSW versions that have Duong's changes. For those SITECONFIG_PATH is supported. I can make the files, no problem. I assume you want to stay with the current /cvmfs/cms-ib.cern.ch/SITECONF area then and the "local" link will be configured for T2_CH_CERN on the machines that mount the filesystem. I'll email you a tar file in a few minutes. Thanks,
- Stephan
Hallo @smuzaffar , the subsite test would apply to only the CMSSW versions that have Duong's changes. For those SITECONFIG_PATH is supported. I can make the files, no problem. I assume you want to stay with the current /cvmfs/cms-ib.cern.ch/SITECONF area then and the "local" link will be configured for T2_CH_CERN on the machines that mount the filesystem. I'll email you a tar file in a few minutes. Thanks,
* Stephan
thanks @stlammel , new SITECONF for IBs/CI tests is now available under /cvmfs/cms-ib.cern.ch/SITECONF
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29390
- This PR adds an extra 116KB to repository
Pull request #37278 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.
There are still a few Tier-2 sites without the sections but those should hopefully be all done within the next days/week.
@stlammel Have these sites been updated already?
@nhduongvn Thanks for the update (and apologies for the delay), my only remaining question is the deployment, i.e. when to change the default from TrivialCatalog to RucioCatalog. I'd like to do that in this PR, so we can test the RucioCatalog code path as well.
I'd suggest to squash the commits into one, backporting one commit can be much easier than many of them (especially if some changes get reverted in the commits). I'd also suggest to clarify the PR title, e.g. something along "Add RucioCatalog and use it by default instead of TrivialCatalog" to make it stand out better in the release notes (it is a significant change after all).
There are still a few Tier-2 sites without the sections but those should hopefully be all done within the next days/week.
@stlammel Have these sites been updated already?
@nhduongvn Thanks for the update (and apologies for the delay), my only remaining question is the deployment, i.e. when to change the default from
TrivialCatalogtoRucioCatalog. I'd like to do that in this PR, so we can test theRucioCatalogcode path as well.I'd suggest to squash the commits into one, backporting one commit can be much easier than many of them (especially if some changes get reverted in the commits). I'd also suggest to clarify the PR title, e.g. something along "Add RucioCatalog and use it by default instead of TrivialCatalog" to make it stand out better in the release notes (it is a significant change after all).
@makortel OK, I changed the title. I am not sure what do you mean "squash the commits into one"? As I said above, I will make a new commit later with extra "#include" removed and RucioCatalog ON when all sites are updated.
OK, I changed the title.
Thanks.
I am not sure what do you mean "squash the commits into one"? As I said above, I will make a new commit later with extra "#include" removed and RucioCatalog ON when all sites are updated.
Having all the changes in one commit, see e.g. http://cms-sw.github.io/faq.html#how-do-i-collapse-multiple-commits-into-one (search engines can give more material). To me it looks like the commits currently in this PR would not have historical significance (I could be wrong), and backporting single commits is much easier than backporting many when conflicts are expected (since this PR is going to be backported to many earlier release cycles).