message_ix icon indicating copy to clipboard operation
message_ix copied to clipboard

Define and apply a GAMS code style

Open khaeru opened this issue 9 months ago • 2 comments

Currently we have only the following in our documentation: https://github.com/iiasa/message_ix/blob/ea06614afedba0740a6ae86f8408a84e4a187a11/doc/contributing.rst?plain=1#L279-L281

Working on #924, I found that the formatting of some of the existing equations made reading and understanding the code unnecessarily difficult. For example, in the following from ACTIVITY_CONSTRAINT_UP:

            )
* growth of 'capital stock' from previous period
        + SUM(year_all2$( seq_period(year_all2,year) ),
                CAP_NEW(node,inv_tec,year_all2)$( map_tec(node,inv_tec,year_all2) AND model_horizon(year_all2) )
                + historical_new_capacity(node,inv_tec,year_all2)
                # placeholder for spillover across nodes, technologies, periods (other than immediate predecessor)
            ) * POWER( 1 + growth_new_capacity_lo(node,inv_tec,year) , duration_period(year) )
* 'soft' relaxation of dynamic constraints

…where are the closing parentheses for each of the three calls of the GAMS SUM() function?

In that PR, I loosely experimented with a style like the following:

NEW_CAPACITY_CONSTRAINT_UP(n,t,y_)$(
    inv_tec(t) AND map_tec(n,t,y_) AND is_dynamic_new_capacity_up(n,t,y_)
)..
  CAP_NEW(n,t,y_)

  =L=

  # 'Hard' constraint value
  SUM(
    y_prev$seq_period(y_prev, y_),
    (
      # New capacity in previous model period
      CAP_NEW(n,t,y_prev)$(model_horizon(y_prev) AND map_tec(n,t,y_prev))

      # New capacity in previous historical period
      + historical_new_capacity(n,t,y_prev)

      # Otherwise, maximum initial value
      # FIXME Do not use this if any of the above are non-zero
      + initial_new_capacity_up(n,t,y_)
    )
    * k_gncu(n,t,y_prev,y_)
  )

  # 'Soft' relaxation of constraint
  + (
    CAP_NEW_UP(n,t,y_) * (POWER(1 + soft_new_capacity_up(n,t,y_), duration_period(y_)) - 1)
  )$soft_new_capacity_up(n,t,y_)

  # Additional relaxation for calibration and debugging
%SLACK_CAP_NEW_DYNAMIC_UP% + SLACK_CAP_NEW_DYNAMIC_UP(n,t,y_)
;

This feels much easier to read, especially when also looking at our Python code that uses the Black code style.

If we were to formalize such a style, it might include rules like the following:

  • Operators must appear either:
    • Inline, with spaces on either side, or,
    • At the start of a new line, followed by the operand or opening parenthesis.
  • Parentheses:
    • Parentheses must not appear around single identifiers, including in GAMS dollar conditionals (a$b(x,y,z), not (a)$(b(x,y,z)).
    • Closing parentheses must appear either:
      • On the same line as the corresponding opening parenthesis, if the parenthesized expression fits within the line length, or else
      • On a line with the same indentation as the line containing the corresponding opening parenthesis.
  • Comments:
    • Inline comments in expressions must use the # character and must match the local indentation.
  • Blank lines within expressions (parameter assignments, equations, etc.) should be used to clarify structure.
  • Short aliases for set names should be used.
  • Indent using 2 spaces.
  • Wrap lines at 120 characters.

Further thoughts:

  • As with 'black', the advantage would be "[f]ormatting [that] becomes transparent after a while [so that] you can focus on the content instead."
  • Like semantic line breaks in #927/#931, this should be applied incrementally as parts of the GAMS code are added/revised. I don't think we need a massive PR to rewrite all the code.
  • I am not aware of any auto-formatter for GAMS code similar to ruff or black. If one exists, we could consider to use it. If not, then I think it is probably excessive effort to create one, especially given #879.

khaeru avatar Apr 16 '25 11:04 khaeru

As an addendum model_core.gms is very large and strictly speaking its model core ++ at this point. I was wondering if it would make sense to have an issue to have it broken up into smaller semantic chunks with includes. Since gams compiles into LST files, we could verify that there are no issues by comparing the monolith against the chunkified version. This way it would also be easy to cordon off the unused model features and also have tests per equation. Connecting with #879, having tests per equation would make it feasible to achieve code translation away from gams.

Going a bit further into unused model features, this would be a good opportunity to clearly specify what is core message, i.e a minimal viable subset. And what are extensions to the model.

Wegatriespython avatar Sep 08 '25 07:09 Wegatriespython

Quickly searching for an auto formatter for GAMS code yields nothing for me. GAMS does have a page on "Good Coding Practices", including a section on formatting, but they don't mention any automatic tool, so at least there's no official tool, it seems.

glatterf42 avatar Sep 17 '25 08:09 glatterf42