outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Fix the bug regarding unquoted strings in collection types in the DSL

Open RobinPicard opened this issue 2 months ago • 0 comments

Explores a solution to #1630

As shown in the issue above, we have a problem in our regex DSL regarding the quotation of text elements in Python collection types. Quite simply, we must put between quotations marks everything that ends up being a string if it's used in a collection type to respect Python grammar. The issue is that if those same elements are not used in a collection type, then there is no need to add those quotations marks (and we should not add them as the user never asked for it).

We need to have a system that allows us to add these quotes when handling collection types within our DSL. We need to:

  • Automatically assess whether some elements require to be quoted (for instance basic Python types)
  • Let the user specify whether they should be for others (for instances for the Regex class)
  • Do both of those above for all terms and python types beyond the basic elements containing a single element. This is required as a collection type could contain for instance a Sequence, a Literal, a QuantifyMinimum... and we want to quote the whole containing term, not the items is contains

The solution envisioned uses 2 properties possessed by each term, knowing that all Python types are eventually turned into terms (the naming of those properties will be improved):

  • requires_quoting: tells whether this term should be quoted if it where in a collection type. The value of the property can either be set by the user or is deduced from the type of term or the other terms it contains.
  • apply_quotation: whether the term is in a collection type such that it should be quoted (if applicable). The value of the property default to False, but is then turned to True in the python_types_to_terms function depending on whether the term is contained in a collection type.

In the to_regex function, we wrap the content of the term in repr if both properties above are True

RobinPicard avatar Sep 30 '25 09:09 RobinPicard