schema_salad icon indicating copy to clipboard operation
schema_salad copied to clipboard

Line numbers

Open acoleman2000 opened this issue 2 years ago • 3 comments

I made changes to python_codegen.py, python_codegen_support.py, and introduced a test file test_line_numbers.py that intergrates with the test suite.

I identified several blockers within the current code preventing line numbers from being associated with keys during the saving process.

During the loading process, the cwl is read in and saved as a CommentedMap, which has associated line numbers. However in the _document_load method in python_codegen_support.py the CommentedMap was replaced with a dictionary

       doc = {
            k: v
            for k, v in doc.items()
            if k not in ("$namespaces", "$schemas", "$base")
           } 

I replaced this code with

        if "$namespaces" in doc:
            doc.pop("$namespaces")
        if "$schemas" in doc:
            doc.pop("$schemas")
        if "$base" in doc:
            doc.pop("$base")

to keep doc in CommentedMap form.

Additionally, I noticed in the fromDoc method doc was being set to None or overriden to be something else, so I saved the original passed in doc as self._doc, following the naming conventions.

I wanted to use the lc info from the original YAML passed in, so I modified the save method for each class to take in line_numbers, a CommentedMap. If line_numbers isn't null, it replaces the self._doc field. This is done to save the original CommentedMap and propagate it downwards.

python_codegen_support.py

I added several methods.

I added a method that extracts the max_line (+ 1) number from a CommentedMap. This iterates through the child with the highest line number until it reaches the end). This is used to insert the line column info for new fields in the returned doc.

I added a method that adds a the kv lc info into the returned doc. This is the real meat of the change. This takes a CommentedMap to insert into, an old CommentedMap, a dictionary of line numbers, and a dictionary of line numbers to maximum col used in the line, and a max_len variable. First the method checks if the key is in the line numbers, and then inserts the old lc info directly info the new Commented Map. Then, if the key isn't in the line numbers, it checks if the value is in the line numbers and inserts it using that line number with an adjusted column number (based on the length of the key and the maximum col for that line). It then checks if the value is in the old_doc, and inserts with that lc information. Finally, if neither the key or the val is the line numbers, it inserts it to max_len, and increases max_len by 1. It has appropriate logic for DSL expansion:

elif isinstance(val, str):  # Logic for DSL expansion with "?"
            if val + "?" in line_numbers:
                line = line_numbers[val + "?"]["line"] + shift
                if line in inserted_line_info:
                    line = max_line
                col = line_numbers[val + "?"]["col"]
                new_doc.lc.add_kv_line_col(key, [line, col, line, col + len(key) + 2])
                inserted_line_info[line] = col + len(key) + 2

I added a method that pulls out the lc info for all kv pairs in a Commented doc. For example, if a CommentedMap was like orderddict("key, "value") with lc info ["key": [1, 0, 1, 6]] it would return {"key": {"line":1, "col": 0}, "value':{"line":1, "col":6}}

I also modified the save method. It changes the return type from list/dict to CommentedSeq/CommentedMap, takes in a doc field, and if the k/v pair is in the doc, it adds the lc info to the return type.

I added a method, iterate_through_doc, that has no type check and takes a list of keys, and iterates through the global doc to the appropriate place. It has no type check since it goes from CommentedMap -> CommentedSeq before eventually ending up at a CommentedMap (or None)

python_codegen.py

I modified several things in python_codegen.py

First, I modified the fromDoc attribute to save the self._doc attribute to the class.

I modified the save method. I changed the return type r from dict to CommentedMap. I added the code to override the self._doc, calculate max_len, line_numbers, and set an empty dictionary to store col info. I also updated max_len after inserting each class attribute to r by calling add_kv, which also adds the lc value to r.

To prevent issues of something like the outputs key being before an inputs key and overexpanding, causing inconsistency with line numbers, I iterate through all keys in the line number doc and add the line numbers, before going through all attributes like normal.

                if isinstance(key, str):
                    if hasattr(self, key):
                        if getattr(self, key) is not None:
                            #add lc info

Additionally, due to array expansion and DSL expansion, sometimes there is a shift down. To appropriately make sure everything ends up on the same line, I added shift counter that says how many lines to shift down for a value.

test_line_numbers

I added 3 tests.

  • One test is outputs field being before inputs.
  • One test checks secondary files DSL expansion.
  • One test checks type DSL expansion.

acoleman2000 avatar Jan 13 '23 16:01 acoleman2000

Thank you @acoleman2000 for this! Can you run make cleanup?

mr-c avatar Jan 15 '23 14:01 mr-c

To re-create "metaschema.py" do

schema-salad-tool --codegen=python schema_salad/metaschema/metaschema.yml > schema_salad/metaschema.py

tetron avatar Jan 17 '23 16:01 tetron

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (138e249) 83.68% compared to head (be53207) 83.63%.

:exclamation: Current head be53207 differs from pull request most recent head 3afd4b0. Consider uploading reports for the commit 3afd4b0 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #647      +/-   ##
==========================================
- Coverage   83.68%   83.63%   -0.06%     
==========================================
  Files          22       22              
  Lines        4580     4497      -83     
  Branches     1239     1242       +3     
==========================================
- Hits         3833     3761      -72     
+ Misses        483      470      -13     
- Partials      264      266       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 08 '23 17:06 codecov[bot]