textmapper icon indicating copy to clipboard operation
textmapper copied to clipboard

Question: status of Go version of Textmapper?

Open mewmew opened this issue 6 years ago • 7 comments

Hi Evgeny,

I just came across Textmapper, and having read the Language Reference and the motivation behind the project, it seems to be exactly what I was looking for. Essentially an LR version of ANTLR for Go. I can tell that you have a lot of experience in this domain, as the architecture is well thought out. I still have to dive deep and examine the minute details of the implementation, but my initial reaction of Textmapper is very positive!

Now, of course, I'd like to take tm out for a spin! However, looking at the implementation of tm-go/cmd/textmapper/generate.go, I noticed a TODO in the generate function.

I noticed that you recently ported the Tarjan's algorithm for detecting strongly connected component (in rev 78fc54eedf048929cd4a42ab40b2e1a6160ea31e). My question is, how far is the Go version of Textmapper from being ready for use?

I'd love to try it out!

Cheerful regards, Robin

mewmew avatar Oct 11 '18 11:10 mewmew

The Go version is very far from being complete. I think it will take me two more months to finish porting the lexer generator from Java, and then another two quarters for the parser generator. It is not that it is much work per se but rather my lack of time between work and family. I'm committed though. The main thing I want to get from this rewrite is the support of declarative (and transparent) nonterminal inlining, which should become the main tool in resolving grammar ambiguities. I'm also looking into better compression for generated tables. The compression scheme Textmapper currently uses is the same as in Bison, and it does not scale well to large templated grammars. The problem of generating performant static hash maps seems very interesting to me but I don't want to do this in Java.

Meanwhile, use the Java version. It is stable and generates very performant code. On real-world languages, generated parsers in Go gave me ~100-230MB/sec of lexing throughtput and 20-60MB/sec of parsing throughput. It gets slightly better with each Go release, mostly because of improved register allocation within the Go compiler.

I will refresh the documentation in the upcoming weeks to better cover Textmapper advanced features, such as templates, grammar lookaheads, token sets, error recovery best practices, and the arrow notation for producing ASTs.

inspirer avatar Oct 11 '18 19:10 inspirer

Thanks a lot for the writeup! It's good to know roughly at what stage the Go port is at, what your plans are for future releases and in particular that you are committed to it!

Performance was actually why I started looking at Textmapper. The intention is to evaluate using Textmapper for parsing LLVM IR assembly, and thus switch from using Gocc to Textmapper in the upcoming release of https://github.com/llir/llvm.

There is still quite a bit to do, but I'd say about 80% of the grammar has been ported from Gocc to Textmapper https://github.com/mewmew/l-tm/blob/master/parser/ll.tm

There is still production actions to write, and that will take the other 80% of the project :)

Once more, thanks for releasing Textmapper to the public!

Cheers, Robin

mewmew avatar Oct 12 '18 00:10 mewmew

There is still quite a bit to do, but I'd say about 80% of the grammar has been ported from Gocc to Textmapper https://github.com/mewmew/l-tm/blob/master/parser/ll.tm

The port is now done. And the performance looks very promising.

On real-world languages, generated parsers in Go gave me ~100-230MB/sec of lexing throughtput and 20-60MB/sec of parsing throughput.

I can validate this claim, as I get a parsing throughput of roughly 45 MB/s. Have not yet done the semantic actions for constructing the AST though, so hope that won't bring the performance down too much.

Extract from https://github.com/mewmew/l/issues/6#issuecomment-429498923:

Parsing 1,733,842 lines and 135 MB of LLVM IR assembly, as contained in the 107 source files at decomp/testdata took ~3 seconds; thus ~30ms was used per file, or ~45 MB/s.

mewmew avatar Oct 13 '18 01:10 mewmew

Just a note, the more I use Textmapper the more remarkable I think it is. Evgeny, what you have managed to do is quite an achievement! I've never come across a parser generator before, where the grammar ends up being so readable as the one in Textmapper. I'm quite amazed how well the LLVM IR grammar seem to turn out.

Simply wanted to extend a thank you!

Hats off and with respect. Robin

mewmew avatar Oct 14 '18 19:10 mewmew

Thanks for good words, Robin!

A quick update from me: the Go version reached feature parity with its Java counterpart in lexer generation. It produces byte-for-byte identical output for most grammars, and I'm now working on porting the parser generator. I believe I'm past the midpoint of the rewrite.

inspirer avatar Feb 09 '19 22:02 inspirer

A quick update from me: the Go version reached feature parity with its Java counterpart in lexer generation. It produces byte-for-byte identical output for most grammars, and I'm now working on porting the parser generator. I believe I'm past the midpoint of the rewrite.

That is really wonderful to hear! Thanks for the update.

Wish you the best of springs and happy coding ahead :)

mewmew avatar Feb 09 '19 23:02 mewmew

I'm trying to start using the golang textmapper, and I'm not sure if there's a feature missing or I'm doing something wrong.

I started by simply trying to regenerate the simple parser, but the parser.go and listener.go files are not being generated:

$ cd tm-go/parsers/simple
$ rm *.go
$ ../../cmd/textmapper/textmapper generate simple.tm
$ git status
Changes not staged for commit:
	deleted:    listener.go
	deleted:    parser.go

What am I missing?

EDIT: I found an example with the correct commands here: https://github.com/llir/grammar/blob/5291534192d972964c2745b7c18ac47208dc6be5/Makefile#L5-L7

tmm1 avatar Aug 02 '21 23:08 tmm1

Textmapper is fully rewritten in Go.

Run go install github.com/inspirer/textmapper/cmd/textmapper@latest to install it locally.

In most cases the rewrite is a drop-in replacement for the Java version but there are a few places where the new tool produces slightly different output (mostly in identifiers) or is more strict to grammar errors. Expect the following errors:

  • similar names in the grammar (capitalization, camel vs snake case, etc.) cause a grammar compilation error to avoid confusion and actual compilation errors down the road
  • declarative lookaheads are properly checked to be mutually exclusive (the previous implementation was too lenient)
  • unused patterns get reported
  • syntax sugar is processed in a slightly different order, which in some cases produces a different output
  • (label? -> Foo) is now correctly reported as an empty node when 'label' is missing. Rewrite it as (label -> Foo)?.

There is a new flag --compat which tries to reduce the variation in the generated code between the versions.

Important: the new version uses https://pkg.go.dev/text/template as the templating language. If you override any templates in your grammar, you'll have to update them. Under the --compat flag Texmapper tries to translate previous templates into new templates but this breaks pretty quickly on advanced grammars.

As the first step during the migration, run textmapper generate --diff --compat to see any new errors and the difference in generated code (compares generated code vs the on-disk state).

Bonus: a new grammar option optimizeTables = true speeds up large grammars by 30-80%.

I've successfully migrated dozens of grammars recently and the new version is handling them well. Please let me know if you get into any issues.

inspirer avatar Aug 27 '23 22:08 inspirer