tree-sitter-julia icon indicating copy to clipboard operation
tree-sitter-julia copied to clipboard

Question: expand `for` and `struct` grammar with headers?

Open simonmandlik opened this issue 1 year ago • 3 comments

Did you check existing issues?

  • [X] I have read all the tree-sitter docs if it relates to using the parser
  • [X] I have searched the existing issues of tree-sitter-julia

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

No response

Describe the bug

This is not a bug, but more like a question/feature request.

I'm trying to update / fix julia queries for neovim and I'm having a very hard time with for loops and also structs.

This python example

for a in range(10):
    pass

is parsed as follows:

(module ; [0, 0] - [2, 0]
  (for_statement ; [0, 0] - [1, 8]
    left: (identifier) ; [0, 4] - [0, 5]
    right: (call ; [0, 9] - [0, 18]
      function: (identifier) ; [0, 9] - [0, 14]
      arguments: (argument_list ; [0, 14] - [0, 18]
        (integer))) ; [0, 15] - [0, 17]
    body: (block ; [1, 4] - [1, 8]
      (pass_statement)))) ; [1, 4] - [1, 8]

and this julia example

for a in 1:10, b in 1:10
    print(a)
end

is parsed as

(source_file ; [0, 0] - [3, 0]
  (for_statement ; [0, 0] - [2, 3]
    (for_binding ; [0, 4] - [0, 13]
      (identifier) ; [0, 4] - [0, 5]
      (range_expression ; [0, 9] - [0, 13]
        (integer_literal) ; [0, 9] - [0, 10]
        (integer_literal))) ; [0, 11] - [0, 13]
    (for_binding ; [0, 15] - [0, 24]
      (identifier) ; [0, 15] - [0, 16]
      (range_expression ; [0, 20] - [0, 24]
        (integer_literal) ; [0, 20] - [0, 21]
        (integer_literal))) ; [0, 22] - [0, 24]
    (call_expression ; [1, 4] - [1, 12]
      (identifier) ; [1, 4] - [1, 9]
      (argument_list ; [1, 9] - [1, 12]
        (identifier))))) ; [1, 10] - [1, 11]

Because the two for_binding nodes are not grouped together in any way and are siblings of the call_expression, I couldn't write any query that would correctly select the loop "header" (regardless of the number of variables iterated over), and neither any query that would select the body without the "header". This might be due to the fact that I'm no expert in TS queries, but for Python such queries are really simple.

Similar situation is with struct definitions:

struct A{B, C} <: D
    x
    y
end

is parsed as

(source_file ; [0, 0] - [4, 0]
  (struct_definition ; [0, 0] - [3, 3]
    name: (identifier) ; [0, 7] - [0, 8]
    (type_parameter_list ; [0, 8] - [0, 14]
      (identifier) ; [0, 9] - [0, 10]
      (identifier)) ; [0, 12] - [0, 13]
    (type_clause ; [0, 15] - [0, 19]
      (operator) ; [0, 15] - [0, 17]
      (identifier)) ; [0, 18] - [0, 19]
    (identifier) ; [1, 4] - [1, 5]
    (identifier))) ; [2, 4] - [2, 5]

Again, struct header nodes type_parameter_list and type_clause are siblings of the struct body.

Is there a reason not to group struct and loop "headers" together similarly to how python is parsed?

simonmandlik avatar Jun 21 '24 20:06 simonmandlik

Ifs in python also provide consequence child:

if True:
    pass
elif False:
    pass
else:
    pass
(module ; [0, 0] - [6, 0]
  (if_statement ; [0, 0] - [5, 8]
    condition: (true) ; [0, 3] - [0, 7]
    consequence: (block ; [1, 4] - [1, 8]
      (pass_statement)) ; [1, 4] - [1, 8]
    alternative: (elif_clause ; [2, 0] - [3, 8]
      condition: (false) ; [2, 5] - [2, 10]
      consequence: (block ; [3, 4] - [3, 8]
        (pass_statement))) ; [3, 4] - [3, 8]
    alternative: (else_clause ; [4, 0] - [5, 8]
      body: (block ; [5, 4] - [5, 8]
        (pass_statement))))) ; [5, 4] - [5, 8]

whereas in julia all "consequence" lines are siblings of the condition:

if true
    1
    1
elseif false
    1
else
    1
end
(source_file ; [0, 0] - [8, 0]
  (if_statement ; [0, 0] - [7, 3]
    condition: (boolean_literal) ; [0, 3] - [0, 7]
    (integer_literal) ; [1, 4] - [1, 5]
    (integer_literal) ; [2, 4] - [2, 5]
    alternative: (elseif_clause ; [3, 0] - [5, 0]
      condition: (boolean_literal) ; [3, 7] - [3, 12]
      (integer_literal)) ; [4, 4] - [4, 5]
    alternative: (else_clause ; [5, 0] - [7, 0]
      (integer_literal)))) ; [6, 4] - [6, 5]

simonmandlik avatar Jun 21 '24 20:06 simonmandlik

There's two seperate issues here, so I'll address them separately.

Querying inner blocks

The block rule used in the grammar is not visible (see #73). There's no technical limitation here, but making it visible is a breaking change that would require updating almost all tests.

Querying "headers"

If blocks were visible, querying headers would be really simple, since they're always "the thing before the block".

For now, I can only think of a couple of workarounds:

  • if and while conditions are a single expression, so this would work:
    (if_statement . (_) @condition)
    
  • for and let have their own rules for bindings, so this would work:
    (for_statement ((for_binding) ("," (for_binding))*) @bindings)
    

In the case of structs... The way they're currently parsed is awful. I took a much simpler approach for the lezer-julia grammar, and that should probably get ported here.

savq avatar Jun 23 '24 00:06 savq

@savq thanks for the reply!

I prepared a PR https://github.com/nvim-treesitter/nvim-treesitter-textobjects/pull/639, any comments would be greatly appreciated!

The block rule used in the grammar is not visible (see https://github.com/tree-sitter/tree-sitter-julia/issues/73). There's no technical limitation here, but making it visible is a breaking change that would require updating almost all tests.

Yes, this would really help a lot. For ifs, conditions are easy for example as they are under the condition field, but selecting blocks is more difficult (and would have to rely on the matching algorithm, as elseif is for example a sibling of all nodes in the block)

(for_statement ((for_binding) ("," (for_binding))*) @bindings)

I tested this and it selects only one for_binding at a time, not all of them

simonmandlik avatar Jun 23 '24 08:06 simonmandlik