pecan Allow for comments in pecan.xml by stripping them in `read.settings()`

Currently comments in pecan.xml (see example below) don't get stripped and can cause problems. In this example, the comment in the <ed2in_tags> ends up as a length 0 element in the settings list named comment which causes write.settings.ED2 to error with Error in vapply(custom_tags, function(x) grepl(numvec_rxp, x), logical(1)) : values must be length 1,

<ed2in_tags>
      <PFT_1ST_CHECK>0</PFT_1ST_CHECK>
      <IED_INIT_MODE>0</IED_INIT_MODE>
      <!-- turn off cohort and patch fusion. Not necessary for short runs -->
      <MAXCOHORT>0</MAXCOHORT>
      <MAXPATCH>0</MAXPATCH>
    </ed2in_tags>

Here's a possible pattern to strip comments using the xml2 package:

tt <-
  '<x>
     <a>text</a>
     <b foo="1"/>
     <!-- a comment -->
     <c bar="me">
        <d>a phrase</d>
        <!-- deeper comment -->
     </c>
  </x>'
xml <- xml2::read_xml(tt)
xml
#> {xml_document}
#> <x>
#> [1] <a>text</a>
#> [2] <b foo="1"/>
#> [3] <c bar="me">\n  <d>a phrase</d>\n  <!-- deeper comment -->\n</c>
# https://stackoverflow.com/questions/784745/accessing-comments-in-xml-using-xpath
comments <- xml2::xml_find_all(xml, "//comment()")
xml2::xml_remove(comments)
xml
#> {xml_document}
#> <x>
#> [1] <a>text</a>
#> [2] <b foo="1"/>
#> [3] <c bar="me">\n  <d>a phrase</d>\n</c>

I know pecan currently uses XML, but it might be a good opportunity to start the switch to xml2 since XML is no longer maintained.

full pecan.xml


<?xml version="1.0"?>
<pecan>
  <info>
    <!--Edit this-->
    <notes>site name</notes>
  </info>
  <!-- Edit the outdir name to replace SN with the two-letter abreviation for this site -->  
  <outdir>/data/tests/ed2_transect_SN</outdir>
  <database>
    <bety>
      <driver>PostgreSQL</driver>
      <user>bety</user>
      <password>bety</password>
      <host>postgres</host>
      <dbname>bety</dbname>
      <write>FALSE</write>
    </bety>
    <dbfiles>/data/dbfiles</dbfiles>
  </database>
  <pfts>
    <pft>
      <name>SetariaWT</name>
      <ed2_pft_number>1</ed2_pft_number>
    </pft>
    <!-- Add additional PFTs here -->
    <pft>
      <name>ebifarm.c3grass</name>
      <ed2_pft_number>5</ed2_pft_number>
    </pft>
  </pfts>

  <meta.analysis>
    <iter>3000</iter>
    <random.effects>TRUE</random.effects>
    <threshold>1.2</threshold>
    <update>TRUE</update>
  </meta.analysis>
<!--
  <sensitivity.analysis>
      <quantiles>
          <sigma>-1</sigma>
          <sigma>1</sigma>
      </quantiles>
    <variable>NPP</variable>
    <start.year>2019</start.year>
    <end.year>2019</end.year>
  </sensitivity.analysis>
-->
  <ensemble>
    <size>50</size>
    <variable>NPP</variable>
    <samplingspace>
      <parameters>
        <method>lhc</method>
      </parameters>
  </samplingspace>
  </ensemble>

  <model>
    <type>ED2</type>
    <binary>/groups/dlebauer/ed2_results/global_inputs/ed2_2.2.0_singularity.sh</binary>
    <id>14</id>
    <edin>/pecan/models/ed/inst/ED2IN.r2.2.0</edin>
    <config.header>
      <radiation>
        <lai_min>0.01</lai_min>
      </radiation>
      <ed_misc>
        <output_month>12</output_month>
      </ed_misc>
    </config.header>
    <phenol.scheme>0</phenol.scheme>
    <ed2in_tags>
      <PFT_1ST_CHECK>0</PFT_1ST_CHECK>
      <IED_INIT_MODE>0</IED_INIT_MODE>
      <!-- turn off cohort and patch fusion. Not necessary for short runs -->
      <MAXCOHORT>0</MAXCOHORT>
      <MAXPATCH>0</MAXPATCH>
    </ed2in_tags>
  </model>

  <run>
    <site>
      <!-- edit the site ID to match the site on BETYdb -->
      <id>678</id>
      <met.start>2019-01-01 00:00:00</met.start>
      <met.end>2019-12-31 23:59:59</met.end>
    </site>

    <inputs>
      <!-- edit the site ID in this path also -->
      <met>/data/sites/MERRA_ED2_site_0-678/ED_MET_DRIVER_HEADER</met>
      <veg>/data/oge2OLD/OGE2_</veg>
      <soil>/data/faoOLD/FAO_</soil>
      <lu>/data/ed_inputs/glu/</lu>
      <thsum>/data/ed_inputs/</thsum>
    </inputs>

    <start.date>2019-01-01</start.date>
    <end.date>2019-12-31</end.date>
  </run>
  
  <host>
      <!-- edit to whatever is in your .ssh/config -->
      <name>puma</name>
      <!-- don't edit the rest unless you know what you're doing -->
      <folder>/groups/dlebauer/ed2_results/pecan_remote</folder>
      <qsub>sbatch --job-name=@NAME@ --account=dlebauer --ntasks=28 --nodes=3 --time=25:00:00 -o @STDOUT@ -e @STDERR@</qsub>
      <qsub.jobid>\\D*([0-9]+)\\D*</qsub.jobid>
      <qstat>'squeue --job @JOBID@ &amp;> /dev/null || echo DONE'</qstat>
      <prerun>module load openmpi3</prerun>
      <modellauncher>
        <binary>/groups/dlebauer/ed2_results/pecan/contrib/modellauncher/modellauncher</binary>
        <qsub.extra>--partition=standard</qsub.extra>
        <mpirun>module load openmpi3; mpirun</mpirun>
      </modellauncher>
  </host>

</pecan>

Jul 19 '22 16:07 Aariq

Other options:

use this xml2 solution without abandoning XML (adds a dependency, but easiest)
find a way of doing this with the XML package
find a way of doing this on the settings object
do a find and replace on the text prior to parsing as xml

(note that XML is once again maintained by CRAN and has many reverse depends and reverse imports)

Jul 19 '22 17:07 dlebauer

Possibly helpful for doing this on the settings object: https://stackoverflow.com/questions/37853679/removing-elements-in-a-nested-r-list-by-name

Aug 04 '22 15:08 Aariq