cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Inconsistent behavior of `VarParsing` default for lists

Open IzaakWN opened this issue 10 months ago • 5 comments

The argument parsing with FWCore.ParameterSet.VarParsing has some inconsistent behavior of defaults when using VarParsing.multiplicity.list, which seems to be cause by the following lines, which has a len (default) requirement, and which appends the default to the list: https://github.com/cms-sw/cmssw/blob/ac2d51b77bd6789940953d54d5d2c7cfbbf940a3/FWCore/ParameterSet/python/VarParsing.py#L410-L417

This works fine if the default is a single string object (VarParsing.varType.string), but this causes issues for some other use cases, for example:

  1. If a user wants a list of objects as default, it will be converted into a list of lists: e.g. ['foo','bar'] becomes [['foo','bar']], while arguments passed via the command line of the form myStrs=foo,bar will yield ['foo','bar']... (Work around by setting the default after parsing the command line arguments.)
  2. If a users wants to set a single integer/float/boolean as default for VarParsing.multiplicity.list, it throws a TypeError("object of type 'int' has no len()") because of the len (default) requirement, forcing the user to setting a list as default, yielding inconsistent results with arguments passed via the command line as explained in (1).

Here is a reproducible code snippet:

import sys
from FWCore.ParameterSet.VarParsing import VarParsing

def parse(argv):
  print(f">>> Parsing {argv!r}...")
  sys.argv = ['test.py','maxEvents=100']+argv
  opts = VarParsing('standard')
  print(">>> Register...")
  def add(*args,**kwargs):
    try:
      print(f">>>   {args!r}, {kwargs!r}...")
      opts.register(*args,**kwargs)
    except Exception as err:
      print(f">>>   => {err!r}")
  add('myInts0',  '',      VarParsing.multiplicity.list, VarParsing.varType.int) # works because len(str)
  add('myInts1',  0,       VarParsing.multiplicity.list, VarParsing.varType.int) # ERROR because len(int)
  add('myInts2',  [0],     VarParsing.multiplicity.list, VarParsing.varType.int) # works because len(list)
  add('myBools0', '',      VarParsing.multiplicity.list, VarParsing.varType.bool) # works because len(str)
  add('myBools1', True,    VarParsing.multiplicity.list, VarParsing.varType.bool) # ERROR because len(bool)
  add('myBools2', [True],  VarParsing.multiplicity.list, VarParsing.varType.bool) # works because len(list)
  add('myStrs0',  '',      VarParsing.multiplicity.list, VarParsing.varType.string) # works because len(str)
  add('myStrs1',  'foo',   VarParsing.multiplicity.list, VarParsing.varType.string) # works because len(str)
  add('myStrs2',  ['foo'], VarParsing.multiplicity.list, VarParsing.varType.string) # works because len(list)
  opts.parseArguments()
  print(f">>> Parsed:")
  for var in opts._lists:
    if var.startswith('my'):
      print(f">>>   {var:6s} = {getattr(opts,var)!r}")

parse([ ])
parse(['myInts1=0,1','myBools1=True,False','myStrs1=foo,bar','myStrs2=foo,bar'])

which yields

>>> Parsing []...
>>> Register...
>>>   ('myInts0', '', 'multiplicity_list', '_int'), {}...
>>>   ('myInts1', 0, 'multiplicity_list', '_int'), {}...
>>>   => TypeError("object of type 'int' has no len()")
>>>   ('myInts2', [0], 'multiplicity_list', '_int'), {}...
>>>   ('myBools0', '', 'multiplicity_list', '_bool'), {}...
>>>   ('myBools1', True, 'multiplicity_list', '_bool'), {}...
>>>   => TypeError("object of type 'bool' has no len()")
>>>   ('myBools2', [True], 'multiplicity_list', '_bool'), {}...
>>>   ('myStrs0', '', 'multiplicity_list', '_string'), {}...
>>>   ('myStrs1', 'foo', 'multiplicity_list', '_string'), {}...
>>>   ('myStrs2', ['foo'], 'multiplicity_list', '_string'), {}...
>>> Parsed:
>>>   myInts0 = []
>>>   myInts1 = []
>>>   myInts2 = [[0]]
>>>   myBools0 = []
>>>   myBools1 = []
>>>   myBools2 = [[True]]
>>>   myStrs0 = []
>>>   myStrs1 = ['foo']
>>>   myStrs2 = [['foo']]
>>> Parsing ['myInts1=0,1', 'myBools1=True,False', 'myStrs1=foo,bar', 'myStrs2=foo,bar']...
>>> Register...
>>>   ('myInts0', '', 'multiplicity_list', '_int'), {}...
>>>   ('myInts1', 0, 'multiplicity_list', '_int'), {}...
>>>   => TypeError("object of type 'int' has no len()")
>>>   ('myInts2', [0], 'multiplicity_list', '_int'), {}...
>>>   ('myBools0', '', 'multiplicity_list', '_bool'), {}...
>>>   ('myBools1', True, 'multiplicity_list', '_bool'), {}...
>>>   => TypeError("object of type 'bool' has no len()")
>>>   ('myBools2', [True], 'multiplicity_list', '_bool'), {}...
>>>   ('myStrs0', '', 'multiplicity_list', '_string'), {}...
>>>   ('myStrs1', 'foo', 'multiplicity_list', '_string'), {}...
>>>   ('myStrs2', ['foo'], 'multiplicity_list', '_string'), {}...
>>> Parsed:
>>>   myInts0 = []
>>>   myInts1 = [0, 1]
>>>   myInts2 = [[0]]
>>>   myBools0 = []
>>>   myBools1 = [True, False]
>>>   myBools2 = [[True]]
>>>   myStrs0 = []
>>>   myStrs1 = ['foo', 'bar']
>>>   myStrs2 = ['foo', 'bar']

I see this is already discussed in Issue https://github.com/cms-sw/cmssw/issues/34469 by @kpedro88, and one can use argparse as alternative (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideAboutPythonConfigFile#argparse), and I also realize there's a danger to break many existing configuration scripts if we change the behavior retroactively, but I wonder if we could at least patch the special use case (2) for integers/floats/booleans, which currently throws an TypeError error? For example, something like the following should solve this error without changing the other behavior:

     if (mytype == varType.string and len (default)) or (mytype != varType.string and default != ""):
         self._lists[name].append (default)

Or if we want to also fix use case (1) to prevent lists of lists (although this could break some scripts that relied on this behavior):

     if isinstance(default,list):
         self._lists[name] = default
     elif (mytype == varType.string and default == "") or (mytype != varType.string and default != ""):
         self._lists[name].append (default)

IzaakWN avatar Apr 05 '24 10:04 IzaakWN