Missing datafile definitions not caught
Description
When reading in a MathProg datafile, all parameter/set definitions in the datafile also need to be defined in the config.yaml file. If a definition is present in the datafile, but not in the config.yaml file, then an AmplyError is raised. This logic does not work in reverse.
If the config.yaml file has parameter definitions not present in the datafile, then I would expect a warning/error to be raised. Instead, the parameter is added to the internal datastore with the default value defined in the config.yaml file.
Other read strategies raise a OtooleNameMismatchError in these instances.
How to replicate
Remove the parameter AccumulatedAnnualDemand from a MathProg datafile, and ensure the config.yaml file has the definition:
AccumulatedAnnualDemand:
indices: [REGION,FUEL,YEAR]
type: param
dtype: float
default: 0
Thoughts on Solution
We use the config.yaml file to first determine what parameters to search for in the datafile, then pass that into the Amply object. Therefore, we either need to reformulate this logic, or change how amply deals with missing parameters.
https://github.com/OSeMOSYS/otoole/blob/3c6f04e03b5ad77d0f938ceba546d1079b82c377/src/otoole/read_strategies.py#L297-L325
Related issues/PR
This is an edge case of issue #151, with the rest of the issue addressed in PR #157.
One option is to use a regex to parse the datafile for parameter and set definitions and then check these against the config file prior to reading in the data with the amply parser.
Something like this script can be used to extract lists of sets, parameters and variables from a file. There are significant performance issues though - this is likely to be slow on a large datafile.
import re
def parse_gmpl_code(gmpl_code):
# Initialize the variables to store the sets, parameters, and variables
sets = {}
parameters = {}
variables = {}
# Define regular expressions to match the different GMPL components
set_regex = re.compile(r'set\s+(?P<set_name>[^\s;]+)\s*;')
param_regex = re.compile(r'param\s+(?P<param_name>[A-Za-z]+)\s*(?P<symbolic>symbolic)?(?P<indices>\s*\{[^\}]*\})?\s*(?P<default>default\s+[^;]+)?\s*(?P<binary>binary)?[;:=]')
var_regex = re.compile(r'var\s+(?P<var_name>[^\s;,]+)(?P<indices>\s*\{[^\}]*\})?\s*(?P<bounds>>=\s*[^\s;]+)?\s*;')
# Parse the sets
for match in set_regex.finditer(gmpl_code):
set_name = match.group('set_name')
sets[set_name] = []
# Parse the parameters
for match in param_regex.finditer(gmpl_code):
param_name = match.group('param_name')
indices = match.group('indices')
default = match.group('default')
if indices:
# Parse indices
indices = re.findall(r'\{([^\}]*)\}', indices)[0]
indices = [i.strip() for i in indices.split(',')]
parameters[param_name] = {'indices': indices}
else:
parameters[param_name] = {}
if default:
# Parse default value
default = default.strip().split()[-1]
parameters[param_name]['default'] = default
# Parse the variables
for match in var_regex.finditer(gmpl_code):
var_name = match.group('var_name')
indices = match.group('indices')
bounds = match.group('bounds')
if indices:
# Parse indices
indices = re.findall(r'\{([^\}]*)\}', indices)[0]
indices = [i.strip() for i in indices.split(',')]
variables[var_name] = {'indices': indices}
else:
variables[var_name] = {}
if bounds:
# Parse variable bounds
# bounds = bounds.strip().split()[-1]
variables[var_name]['bounds'] = bounds
# Return the parsed sets, parameters, and variables
return sets, parameters, variables
with open('OSeMOSYS.txt', 'r') as textfile:
osemosys = textfile.readlines()
sets, params, vars = parse_gmpl_code("".join(osemosys))