cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Add new data catalogs from Rucio storage description (RucioCatalog) and use it by default instead of trivial data catalogs (TrivialCatalog)

Open nhduongvn opened this issue 4 years ago • 110 comments

PR description:

New data access using catalogs (defined in ) and storage.json

PR validation:

if this PR is a backport please specify the original PR and why you need to backport that PR:

Before submitting your pull requests, make sure you followed this checklist:

nhduongvn avatar Mar 18 '22 22:03 nhduongvn

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905

  • This PR adds an extra 32KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

  • code-checks: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905/code-checks.patch e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905/code-checks.patch | patch -p1 You can also run scram build code-checks to apply code checks directly

  • code-format: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905/code-format.patch e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905/code-format.patch | patch -p1 You can also run scram build code-format to apply code format directly

cmsbuild avatar Mar 18 '22 22:03 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28906

  • This PR adds an extra 36KB to repository

cmsbuild avatar Mar 18 '22 22:03 cmsbuild

A new Pull Request was created by @nhduongvn for master.

It involves the following packages:

  • FWCore/Catalog (core)
  • FWCore/Services (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks. @makortel, @wddgit this is something you requested to watch as well. @perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Mar 18 '22 22:03 cmsbuild

What is the motivation for this change?

Dr15Jones avatar Mar 19 '22 13:03 Dr15Jones

Hi Chris, The changes are needed for transition to a new storage description at sites required by Rucio (and then we switched from XML to the more modern json format) too. So, this is effectively completing the Rucio transition. Best

nhduongvn avatar Mar 19 '22 22:03 nhduongvn

Hallo Matti,

Is there any need to backport this to earlier release cycles?

yes, it would be desirable to port this to the latest versions of all CMSSW_X releases so storage.xml can be phased out (at some point).

How long does the reading of storage.xml need to be supported? E.g. do all the sites already provide storage.json?

All sites have a storage.json file. The site support team is adding the two new site-local-config.xml sections to the remaining sites this week.

Thanks,

  • Stephan

stlammel avatar Mar 21 '22 18:03 stlammel

Thanks Stephan. That sounds to me like there would be no need to keep the storage.xml reading support (or, maybe keep the code for a while but change the default to storage.json).

makortel avatar Mar 21 '22 18:03 makortel

Hallo Matti,

We also came up with an additional question: is it intentional that the path to storage.json is not configurable within the site-local-config.xml file (like the path to storage.xml is currently)?

Yes, this is an intentional change: A site's storage.json and site-local-config.xml might be at different locations for different worker nodes/subsites. This way only the SITECONF directory structure is "fixed" with one entry point, SITECONFIG_PATH.

Thanks,

  • Stephan

stlammel avatar Mar 21 '22 18:03 stlammel

I have received your comments and will work them out one by one.

nhduongvn avatar Mar 21 '22 21:03 nhduongvn

Hi @makortel @stlammel I will try to finish updating the codes today or tomorrow. I got "segmentation fault" when running tests last week and could not track down where. I have to do implementation step by step and run test along. Hopefully, I can resolve this, if not I will report back.

nhduongvn avatar Apr 04 '22 08:04 nhduongvn

I have gone slowly to modify codes from Mati suggestions and done test along, so far no "segmentation fault" (I might did something wrong in the past when modifying all suggestions at once and did the test at the end). I will continue tomorrow.

nhduongvn avatar Apr 06 '22 03:04 nhduongvn

Hi @makortel, I made a new branch "test_struct_dataCatalog" to test storing data catalog as a struct. The struct is defined here: https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L35 I also made two constructors for "FileLocator" https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/FileLocator.h#L21 https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/FileLocator.h#L22 However, when I compiled there are two errors in [1]. I am not sure how to overcome this currently. This branch contains all modifications you have suggested so if these are solved I will merge it with the branch used in the current pull request and the review can be restarted again. As you remember, I said that I saw "crashes" when using struct or adding a new function to SiteLocalConfig.h (FWCore/Catalog/interface/SiteLocalConfig.h). I did not see them now. This could be that I implemented things in bulk and tested at the end and there were bugs somewhere that caused crashes (It could be that I used older CMSSW version). This time I carefully implement things step-by-step and doing test along. Anyway, these are not problems anymore and we can move on. Please help me to solve these compilation errors. I have other question: where the struct can be optimally located? in SiteLocalConfig.h? Thank you, Duong

====================================== [1] /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:77:27: error: expected constructor, destructor, or type conversion before '(' token 77 | FileLocator::FileLocator(edm::SiteLocalConfig::CatalogAtrr const& catAttr, unsigned iCatalog) | ^ /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc: In member function 'void edm::FileLocator::init_da(const edm::SiteLocalConfig::CatalogAttr&, unsigned int)': /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:242:33: error: passing 'const edm::SiteLocalConfig::CatalogAttr' as 'this' argument discards qualifiers [-fpermissive] 242 | if (input_dataCatalog.empty()) { | ^ In file included from /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/interface/FileLocator.h:4, from /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:1: /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/interface/SiteLocalConfig.h:37:18: note: in call to 'const bool edm::SiteLocalConfig::CatalogAttr::empty()' 37 | bool const empty(){return site==""&&subSite==""&&storageSite==""&&volume==""&&protocol=="";}

nhduongvn avatar Apr 11 '22 19:04 nhduongvn

Thanks @nhduongvn. The compilation error comes from edm::SiteLocalConfig::CatalogAttr::empty() not being const. Basically this line https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L37 should be changed to

      bool empty() const {return site==""&&subSite==""&&storageSite==""&&volume==""&&protocol=="";}

(plus formatting). I'd actually prefer to use the std::string::empty(), i.e.

      bool empty() const {return site.empty() and subSite.empty() and storageSite.empty() and volume.empty() and protocol.empty();}

The SiteLocalConfig.h as a place of the struct is ok. I would, however, move it outside of SiteLocalConfig class, i.e. directly to edm namespace, and spell out the name fully, i.e. CatalogAttributes. Note that the constructor on line https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L36 could be just CatalogAttr() = default; or even left out (I think).

I would also make the enum CatalogType as enum class CatalogType, and move directly to edm namespace (assuming the enumeration is still needed, I've forgotten already that level of detail).

makortel avatar Apr 11 '22 20:04 makortel

Thank you. The errors were fixed.

nhduongvn avatar Apr 12 '22 15:04 nhduongvn

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29278

  • This PR adds an extra 216KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

  • code-format: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29278/code-format.patch e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29278/code-format.patch | patch -p1 You can also run scram build code-format to apply code format directly

cmsbuild avatar Apr 13 '22 11:04 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29279

  • This PR adds an extra 92KB to repository

cmsbuild avatar Apr 13 '22 11:04 cmsbuild

Pull request #37278 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

cmsbuild avatar Apr 13 '22 11:04 cmsbuild

There are still a few Tier-2 sites without the sections but those should hopefully be all done within the next days/week.

  • Stephan

stlammel avatar Apr 13 '22 21:04 stlammel

> Thanks @nhduongvn for the update. My comments on the code are mostly fine tuning. > > I still have one bigger question in my mind about deployment of the Rucio Catalog code. If I understood @stlammel's #37278 (comment) correctly, all sites would have storage.json by now. Should we just switch that as the default in this PR? (and later on remove the Trivial Catalog code when it is no longer deemed necessary)

Maybe this can be switched on in the next release (CMSSW_12_3_1?)

> Unless I missed something, I didn't see any tests for the Rucio Catalog. Could you add tests using storage.json, preferably to the same extent as we have for the Trivial Catalog? Thanks!

For trivial catalog, I do not see test for storage.xml. As far as I know there are only tests for SiteLocalConfigService in FWCore/Services/test/test_catch2_SiteLocalConfigService.cc, which check the "dummy" configuration in FWCore/Services/test/*.testfile. Actually, I am not very sure why we need to run these tests and when and where these tests are called/used. Please note that in the FWCore/Services/test/test_catch2_SiteLocalConfigService.cc, there are already equivalent checks for Rucio catalog, for example: CHECK(slc.dataCatalogs()[0] == "trivialcatalog_file:/dummy/storage.xml?protocol=dcap"); edm::CatalogAttributes tmp("DUMMY", "DUMMY_SUB_SITE", "DUMMY_CROSS_SITE", "CMSXrootdFederation", "XRootD"); CHECK(slc.dataCatalogs_da()[0] == tmp);

nhduongvn avatar Apr 14 '22 01:04 nhduongvn

@nhduongvn please note that for CMSSW Integration build and PR tests we use a copy of T2_CH_CERN siteconf which is available under /cvmfs/cms-ib.cern.ch/SITECONF . Note the extra mapping [a] in /cvmfs/cms-ib.cern.ch/SITECONF/local/PhEDEx/storage.xml which allow us to read cached root files under /eos/cms/store/user/cmsbuild/store for IBs/PRs test. Also we use cms-xrd-global global redirector [b] as fallback instead of default xrootd-cms.infn.it. Do we need to change something in new storage.json too?

[a]

  <lfn-to-pfn protocol="ibeos"
    path-match="/+store/(.*)"
    result="root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/$1"/>

[b]

  <!-- Xrootd fallback rules -->
  <lfn-to-pfn protocol="xrootd" destination-match=".*"
    path-match="/+store/(.*)"
    result="root://cms-xrd-global.cern.ch//store/$1"/>

smuzaffar avatar Apr 14 '22 09:04 smuzaffar

Hi @smuzaffar , I took a look at test settings you are mentioning (/cvmfs/cms-ib.cern.ch/SITECONF). It looks like the settings are OK. The site-local-config.xml and storage.json are there with consistent definitions for data access: /cvmfs/cms-ib.cern.ch/SITECONF/local/JobConfig/site-local-config.xml <data-access> <catalog volume="CERN_EOS" protocol="XRootD"/> <catalog site="T1_IT_CNAF" volume="Eurasian_Federation" protocol="XRootD"/> </data-access> I can see matched storage description for the first data catalog () in storage.json so this PR should work fine for the tests. I think it is enough. However, in order to test subsite mechanism more configurations are needed. One thing I notice is that the second catalog () will not work if the code attempts to do that because there is no "T1_IT_CNAF" in /cvmfs/cms-ib.cern.ch/SITECONF/ For short, for testing the new Rucio data access mechanism, as far as I know, there might not need to modify or add codes to this PRs but more on the site configuration for testing. @stlammel , would it be possible to setup testing subsite and T1_IT_CNAF in /cvmfs/cms-ib.cern.ch/SITECONF/? And maybe you want to comment more here.

nhduongvn avatar Apr 14 '22 14:04 nhduongvn

@nhduongvn @smuzaffar Hallo Duong, Shahzad, so, as Duong points out, you would need a copy of the T1_IT_CNAF in /cvmfs/cms-ib.cern.ch/SITECONF or T0_CH_CERN since you want to use the global redirector and not Eurasion redirector. The section in site-local-config.xml would then become:

<catalog volume="CERN_EOS" protocol="XRootD"/>
<catalog site="T0_CH_CERN" volume="CMSXrootdFederation" protocol="XRootD"/>

Now, storage.json needs to be adjusted from the way you have it currently: The prefix needs to be changed from "/eos/cms" to "/eos/cms/store/user/cmsbuild" since files are in /eos/cms/store/user/cmsbuild/store/... and not in /eos/cms/store/... Since this is a "fake" SITECONF, you could add a subsite in there, i.e. make a /cvmfs/cms-ib.cern.ch/SITECONF/T2_CH_CERN/TEST_SUBSITE/JobConfig/site-local-config.xml file with a subsite tag (everything else as the site entry) and then point SITECONFIG_PATH to /cvmfs/cms-ib.cern.ch/SITECONF/T2_CH_CERN/TEST_SUBSITE instead of /cvmfs/cms-ib.cern.ch/SITECONF/T2_CH_CERN .

There is also the option of keeping this all within the standard setup, i.e. not making a separate SITECONF on /cvmfs/cms-ib.cern.ch, if you want. For this, you would add a new "volume" to the T2_CH_CERN storage.json, add a test subsite to the T2_CH_CERN SITECONF with the site-local-config.xml that refers to the new volume, and in the testing just point SITECONFIG_PATH to that new subsite. Thanks,

  • Stephan

stlammel avatar Apr 14 '22 15:04 stlammel

Dear @stlammel @nhduongvn, I am not able to digest the two comments above :-) can you please suggest the required changes for https://github.com/cms-sw/siteconf/tree/master/local . Note that the SITECONF should work for all CMSSW release cycles (e.g. SITECONFIG_PATH is not supported in old cmssw releases). All we want is to use /eos/cms/store/user/cmsbuild/store/ store and global rediector as fallback.

Thanks,

smuzaffar avatar Apr 18 '22 08:04 smuzaffar

Hallo @smuzaffar , the subsite test would apply to only the CMSSW versions that have Duong's changes. For those SITECONFIG_PATH is supported. I can make the files, no problem. I assume you want to stay with the current /cvmfs/cms-ib.cern.ch/SITECONF area then and the "local" link will be configured for T2_CH_CERN on the machines that mount the filesystem. I'll email you a tar file in a few minutes. Thanks,

  • Stephan

stlammel avatar Apr 18 '22 14:04 stlammel

Hallo @smuzaffar , the subsite test would apply to only the CMSSW versions that have Duong's changes. For those SITECONFIG_PATH is supported. I can make the files, no problem. I assume you want to stay with the current /cvmfs/cms-ib.cern.ch/SITECONF area then and the "local" link will be configured for T2_CH_CERN on the machines that mount the filesystem. I'll email you a tar file in a few minutes. Thanks,

* Stephan

thanks @stlammel , new SITECONF for IBs/CI tests is now available under /cvmfs/cms-ib.cern.ch/SITECONF

smuzaffar avatar Apr 19 '22 11:04 smuzaffar

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29390

  • This PR adds an extra 116KB to repository

cmsbuild avatar Apr 19 '22 12:04 cmsbuild

Pull request #37278 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

cmsbuild avatar Apr 19 '22 12:04 cmsbuild

There are still a few Tier-2 sites without the sections but those should hopefully be all done within the next days/week.

@stlammel Have these sites been updated already?

@nhduongvn Thanks for the update (and apologies for the delay), my only remaining question is the deployment, i.e. when to change the default from TrivialCatalog to RucioCatalog. I'd like to do that in this PR, so we can test the RucioCatalog code path as well.

I'd suggest to squash the commits into one, backporting one commit can be much easier than many of them (especially if some changes get reverted in the commits). I'd also suggest to clarify the PR title, e.g. something along "Add RucioCatalog and use it by default instead of TrivialCatalog" to make it stand out better in the release notes (it is a significant change after all).

makortel avatar Apr 27 '22 21:04 makortel

There are still a few Tier-2 sites without the sections but those should hopefully be all done within the next days/week.

@stlammel Have these sites been updated already?

@nhduongvn Thanks for the update (and apologies for the delay), my only remaining question is the deployment, i.e. when to change the default from TrivialCatalog to RucioCatalog. I'd like to do that in this PR, so we can test the RucioCatalog code path as well.

I'd suggest to squash the commits into one, backporting one commit can be much easier than many of them (especially if some changes get reverted in the commits). I'd also suggest to clarify the PR title, e.g. something along "Add RucioCatalog and use it by default instead of TrivialCatalog" to make it stand out better in the release notes (it is a significant change after all).

@makortel OK, I changed the title. I am not sure what do you mean "squash the commits into one"? As I said above, I will make a new commit later with extra "#include" removed and RucioCatalog ON when all sites are updated.

nhduongvn avatar Apr 28 '22 00:04 nhduongvn

OK, I changed the title.

Thanks.

I am not sure what do you mean "squash the commits into one"? As I said above, I will make a new commit later with extra "#include" removed and RucioCatalog ON when all sites are updated.

Having all the changes in one commit, see e.g. http://cms-sw.github.io/faq.html#how-do-i-collapse-multiple-commits-into-one (search engines can give more material). To me it looks like the commits currently in this PR would not have historical significance (I could be wrong), and backporting single commits is much easier than backporting many when conflicts are expected (since this PR is going to be backported to many earlier release cycles).

makortel avatar Apr 28 '22 00:04 makortel