tree-sitter icon indicating copy to clipboard operation
tree-sitter copied to clipboard

Tree Sitter query from stdin instead of a file

Open gwerbin opened this issue 3 years ago • 7 comments

It would be very useful to be able to use the Tree Sitter CLI query command on chunks of code provided on standard input.

~~It would also be useful to be able to manually select a language, instead of attempting to detect it automatically from the input file.~~ Done! See https://github.com/tree-sitter/tree-sitter/issues/1511#issuecomment-1040733058

The ability to manually select a language would also facilitate reading from stdin, because your only failure mode is "parse error", rather than accidentally detecting the wrong language.

I am not a Rust developer at all, but the code doesn't look that difficult to modify. I can try to put together a PR if the Tree Sitter authors are interested (or maybe this would be a "good first issue" for someone in general).

Apologies if this has been discussed elsewhere; I didn't see it in the issue tracker.

gwerbin avatar Nov 29 '21 16:11 gwerbin

+1 to this.

I've tried to use tree-sitter-cli parse on stdin, either by passing /dev/stdin or - as the filename. The former "works" but the CLI tool believes the input is zero-length (perhaps it attempts to stat() the file before reading to determine its length?); the latter just tries to open a regular file called "-". Both of these do not do what is required.

$ node_modules/tree-sitter-cli/tree-sitter parse /dev/stdin
print "Hello"
(source_file [0, 0] - [0, 0])

Workaround:

$ cat >tmpfile
print "Hello"

$ node_modules/tree-sitter-cli/tree-sitter parse tmpfile 
(source_file [0, 0] - [1, 0]
  (ERROR [0, 0] - [0, 13]
    (call_expression [0, 0] - [0, 13]
      function_name: (identifier [0, 0] - [0, 5])
      args: (argument [0, 6] - [0, 13]
        (string_double_quoted [0, 6] - [0, 13])))))
tmpfile 0 ms    (ERROR [0, 0] - [0, 13])

leonerd avatar Feb 15 '22 17:02 leonerd

It would also be useful to be able to manually select a language, instead of attempting to detect it automatically from the input file.

You can use the --scope option to do this:

$ cat >python.txt
import foo
print(foo.x)

$ tree-sitter parse --scope source.python python.txt

The particular --scope value to use comes from the tree-sitter.scope field of the grammar's package.json file (e.g. python).

dcreager avatar Feb 15 '22 19:02 dcreager

Thanks @dcreager, that at least covers the 1st request! Although it's a bit clunky to have to browse through the output of tree-sitter dump-languages to make sure that you have the scope names right.

gwerbin avatar Aug 22 '22 16:08 gwerbin

With respect to:

I've tried to use tree-sitter-cli parse on stdin, either by passing /dev/stdin or - as the filename. The former "works" but the CLI tool believes the input is zero-length (perhaps it attempts to stat() the file before reading to determine its length?);

As a slightly different take, here, the following:

$ echo "(def a 1)" | tree-sitter parse /dev/stdin

or:

$ echo "(def a 1)" | tree-sitter parse /dev/fd/0

produces the output:

(source [0, 0] - [1, 0]
  (list_lit [0, 0] - [0, 9]
    value: (sym_lit [0, 1] - [0, 4]
      name: (sym_name [0, 1] - [0, 4]))
    value: (sym_lit [0, 5] - [0, 6]
      name: (sym_name [0, 5] - [0, 6]))
    value: (num_lit [0, 7] - [0, 8])))

Admittedly, that might be kind of awkward for longer source.

Perhaps this is not platform agnostic though.

In any case, I think this feature would be a nice one to have working via some means. May be treating - as a filename as suggested or an option like --stdin (perhaps even without an argument?).

For query, I think the choice is less obvious regarding what input on standard input should count as. My initial thought was that it should be the query rather than source, but I suppose there might be a use for the reverse case...

sogaiu avatar Jan 26 '23 01:01 sogaiu

There is an implementation of parsing via stdin in ahlinc's alpha branch mentioned here [1].

It has been working well for me and I'm glad to be able to use it :)

Sample uses:

$ echo ":a" | tree-sitter parse -
(source [0, 0] - [1, 0]
  (kwd_lit [0, 0] - [0, 2]
    name: (kwd_name [0, 1] - [0, 2])))

and:

$ tree-sitter parse -
(def a 1)
(source [0, 0] - [1, 0]
  (list_lit [0, 0] - [0, 9]
    value: (sym_lit [0, 1] - [0, 4]
      name: (sym_name [0, 1] - [0, 4]))
    value: (sym_lit [0, 5] - [0, 6]
      name: (sym_name [0, 5] - [0, 6]))
    value: (num_lit [0, 7] - [0, 8])))

Note that in the second use example, after the invocation I typed (def a 1) followed by enter and then Ctrl-D.


[1] If you already have rustup and friends, installation is straight-forward (see link above for details).

sogaiu avatar Feb 03 '23 02:02 sogaiu