grammars-v4
grammars-v4 copied to clipboard
Parser for text-based FBX files
FBX (Filmbox) is a proprietary file format (.fbx), originally developed by Kaydara but now owned by Autodesk since 2006. It is used to provide interoperability between digital content creation applications, although usage of this text-based format appears to be less common than the alternative binary format.
There are actually two parsers, mostly because I wasn't sure which way I wanted to go with supporting this format yet:
- FBX - parses just the nested node structures and you have to figure the names out on your own. But should work for all files you're likely to encounter, because the structure of the format seems fairly simple and unchanging across major versions.
- FBXSemantic - tries to recognise the individual node names and more strictly checks their contents. I think this is likely to be the more useful way to parse this format, but the parser is likely very incomplete as we don't have official documentation for the format. It at least parses all current examples fine.
@hakanai thanks. Please add this grammar into the root pom.xml
Ah yes, I had totally missed that because I had the project open at just the one subproject so that it didn't consume all my memory.
Go failure is,
---------- Testing grammar .\fbx ----------
Building
trgen -t Go --template-sources-directory D:\a\grammars-v4\grammars-v4\_scripts\templates\
Parse succeeded.Parse succeeded.Parse succeeded.D:/a/grammars-v4/grammars-v4/fbx/
No code generated
.\fbx failed
JavaScript is similar
Building
trgen -t JavaScript --template-sources-directory /home/runner/work/grammars-v4/grammars-v4/_scripts/templates/
/home/runner/work/grammars-v4/grammars-v4/fbx/
No code generated
./fbx failed
I assume there's maybe something else I have to add in to make languages other than Java work.
Actually it looks like this is a bug in trgen.
https://github.com/kaby76/Domemtech.Trash/blob/c3113e89ba40a626698b4b8239b26c1a048f2b27/trgen/CGen.cs#L315,L319
var pom_grammar_name = navigator
.Select("//plugins/plugin[artifactId=\"antlr4test-maven-plugin\"]/configuration/grammarName", nsmgr)
.Cast<XPathNavigator>()
.Select(t => t.Value)
.ToList();
Looks like it assumes that the <grammarName> element must be directly inside <configuration>, but in this case there was more than one scenario to run so it isn't. :(
@kaby76 could you take a look at this please?
Building
trgen --antlr-tool-path /Users/runner/work/grammars-v4/grammars-v4/antlr4-4.11.1-complete.jar -t CSharp --template-sources-directory /Users/runner/work/grammars-v4/grammars-v4/_scripts/templates/
/Users/runner/work/grammars-v4/grammars-v4/fbx/
Invalid element "//plugins/plugin[artifactId='antlr4test-maven-plugin']/configuration/scenarios. Correct the pom.xml!
No code generated
./fbx failed
@teverett It fails because of the
@kaby76 thanks!
@kaby76 are you planning to return to this PR?
@kaby76 are you planning to return to this PR?
Sorry I've been working on other things. I will return to this at some point.
I'm going to be modifying the desc.xml format and trgen to include "scenarios" in the next week. After completing that, this PR can be checked in, then integrated into the builds with a separate PR.
The reason for the change to desc.xml is to allow testing of grammars that would normally fail because of time outs. In sql/plsql, the grammar is not particularly efficient on some of the tests, i.e., over 20 seconds. These "long-running" tests cause the tester to fail because the tester doesn't allow any test to run over 5 minutes. (In old days, some JavaScript parses would hang, and the build manually killed.) This shouldn't mean that the grammar doesn't work, but just that it's really slow, and needs special handling. The new desc.xml files will allow one to say how long the time out is, what machines to test on, whether to use group parsing or individual parsing, etc. https://github.com/antlr/grammars-v4/issues/3138
I took all the suggestions aside from moving the optional because it changes the semantics for lists of things. There are many places in the spec which document that a list of things occurs, but then says that the list can contain zero elements.
If there is an overridingly good technical reason to do it the other way then I can change it, but it really looks like introducing a nullable list for the sake of introducing a nullable list, and I really effing hate nullability, especially for collection types. ;)
Thanks for working on this.
I tested this change by adding in a desc.xml file to test the two scenarios listed in the pom.xml. (Note, we don't use the mvn tester because it tests only the Java target. And, although the test warns it's trying to use Antlr 4.11.1 with a newer tool version, one can still use it for the moment.)
While the FBX.g4 grammar works, the FBXSemantic.g4 parser does not parse the input. It may be that there should be different input tests for the FBXSemantic.g4 grammar.
04/14-06:05:09 ~/fbx/grammars-v4/fbx
$ trgen -t CSharp
C:\msys64\home\Kenne\fbx\grammars-v4\fbx
CSharp FBX.g4 success 0.0568966
CSharp FBXSemantic.g4 success 0.0075688
CSharp FBX.g4 success 0.0005988
CSharp FBXSemantic.g4 success 0.0060509
Rendering template file from CSharp/build.ps1 to Generated-CSharp-fbx/build.ps1
Rendering template file from CSharp/build.sh to Generated-CSharp-fbx/build.sh
Rendering template file from CSharp/clean.ps1 to Generated-CSharp-fbx/clean.ps1
Rendering template file from CSharp/clean.sh to Generated-CSharp-fbx/clean.sh
Rendering template file from CSharp/Encodings.cs to Generated-CSharp-fbx/Encodings.cs
Rendering template file from CSharp/ErrorListener.cs to Generated-CSharp-fbx/ErrorListener.cs
Rendering template file from CSharp/makefile to Generated-CSharp-fbx/makefile
Rendering template file from CSharp/perf.sh to Generated-CSharp-fbx/perf.sh
Rendering template file from CSharp/run.ps1 to Generated-CSharp-fbx/run.ps1
Rendering template file from CSharp/run.sh to Generated-CSharp-fbx/run.sh
Rendering template file from CSharp/Test.cs to Generated-CSharp-fbx/Test.cs
Rendering template file from CSharp/Test.csproj to Generated-CSharp-fbx/Test.csproj
Rendering template file from CSharp/test.ps1 to Generated-CSharp-fbx/test.ps1
Rendering template file from CSharp/test.sh to Generated-CSharp-fbx/test.sh
Copying source file from C:/msys64/home/Kenne/fbx/grammars-v4/fbx/README.md to Generated-CSharp-fbx/README.md
Copying source file from C:/msys64/home/Kenne/fbx/grammars-v4/fbx/FBXSemantic.g4 to Generated-CSharp-fbx/FBXSemantic.g4
Copying source file from C:/msys64/home/Kenne/fbx/grammars-v4/fbx/FBX.g4 to Generated-CSharp-fbx/FBX.g4
Copying source file from C:/msys64/home/Kenne/fbx/grammars-v4/fbx/desc.xml to Generated-CSharp-fbx/desc.xml
Rendering template file from CSharp/build.ps1 to Generated-CSharp-fbxsemantic/build.ps1
Rendering template file from CSharp/build.sh to Generated-CSharp-fbxsemantic/build.sh
Rendering template file from CSharp/clean.ps1 to Generated-CSharp-fbxsemantic/clean.ps1
Rendering template file from CSharp/clean.sh to Generated-CSharp-fbxsemantic/clean.sh
Rendering template file from CSharp/Encodings.cs to Generated-CSharp-fbxsemantic/Encodings.cs
Rendering template file from CSharp/ErrorListener.cs to Generated-CSharp-fbxsemantic/ErrorListener.cs
Rendering template file from CSharp/makefile to Generated-CSharp-fbxsemantic/makefile
Rendering template file from CSharp/perf.sh to Generated-CSharp-fbxsemantic/perf.sh
Rendering template file from CSharp/run.ps1 to Generated-CSharp-fbxsemantic/run.ps1
Rendering template file from CSharp/run.sh to Generated-CSharp-fbxsemantic/run.sh
Rendering template file from CSharp/Test.cs to Generated-CSharp-fbxsemantic/Test.cs
Rendering template file from CSharp/Test.csproj to Generated-CSharp-fbxsemantic/Test.csproj
Rendering template file from CSharp/test.ps1 to Generated-CSharp-fbxsemantic/test.ps1
Rendering template file from CSharp/test.sh to Generated-CSharp-fbxsemantic/test.sh
Copying source file from C:/msys64/home/Kenne/fbx/grammars-v4/fbx/README.md to Generated-CSharp-fbxsemantic/README.md
Copying source file from C:/msys64/home/Kenne/fbx/grammars-v4/fbx/FBXSemantic.g4 to Generated-CSharp-fbxsemantic/FBXSemantic.g4
Copying source file from C:/msys64/home/Kenne/fbx/grammars-v4/fbx/FBX.g4 to Generated-CSharp-fbxsemantic/FBX.g4
Copying source file from C:/msys64/home/Kenne/fbx/grammars-v4/fbx/desc.xml to Generated-CSharp-fbxsemantic/desc.xml
04/14-06:05:25 ~/fbx/grammars-v4/fbx
$ cd Generated-CSharp-fbx
04/14-06:05:30 ~/fbx/grammars-v4/fbx/Generated-CSharp-fbx
$ make; make test
bash build.sh
Determining projects to restore...
Restored C:\msys64\home\Kenne\fbx\grammars-v4\fbx\Generated-CSharp-fbx\Test.csproj (in 332 ms).
MSBuild version 17.6.0-preview-23122-01+4b5e303b3 for .NET
Determining projects to restore...
All projects are up-to-date for restore.
C:\Program Files\dotnet\sdk\7.0.300-preview.23122.5\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.RuntimeIdentifierInference.targets(
287,5): message NETSDK1057: You are using a preview version of .NET. See: https://aka.ms/dotnet-support-policy [C:\msys64\home\Kenne
\fbx\grammars-v4\fbx\Generated-CSharp-fbx\Test.csproj]
Test -> C:\msys64\home\Kenne\fbx\grammars-v4\fbx\Generated-CSharp-fbx\bin\Debug\net7.0\Test.dll
Build succeeded.
0 Warning(s)
0 Error(s)
Time Elapsed 00:00:06.52
bash test.sh
dos2unix: converting file ../examples/6_1_0_rock_medium_spr.fbx.errors to Unix format...
dos2unix: converting file ../examples/6_1_0_rock_medium_spr.fbx.tree to Unix format...
dos2unix: converting file ../examples/7_3_0_so_called_minimal.fbx.errors to Unix format...
dos2unix: converting file ../examples/7_3_0_so_called_minimal.fbx.tree to Unix format...
dos2unix: converting file ../examples/7_5_0_walk_test.fbx.errors to Unix format...
dos2unix: converting file ../examples/7_5_0_walk_test.fbx.tree to Unix format...
Test succeeded.
04/14-06:05:45 ~/fbx/grammars-v4/fbx/Generated-CSharp-fbx
$ cd ../Generated-CSharp-fbxsemantic/
04/14-06:05:53 ~/fbx/grammars-v4/fbx/Generated-CSharp-fbxsemantic
$ make; make test
bash build.sh
Determining projects to restore...
Restored C:\msys64\home\Kenne\fbx\grammars-v4\fbx\Generated-CSharp-fbxsemantic\Test.csproj (in 334 ms).
MSBuild version 17.6.0-preview-23122-01+4b5e303b3 for .NET
Determining projects to restore...
All projects are up-to-date for restore.
C:\Program Files\dotnet\sdk\7.0.300-preview.23122.5\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.RuntimeIdentifierInference.targets(
287,5): message NETSDK1057: You are using a preview version of .NET. See: https://aka.ms/dotnet-support-policy [C:\msys64\home\Kenne
\fbx\grammars-v4\fbx\Generated-CSharp-fbxsemantic\Test.csproj]
Test -> C:\msys64\home\Kenne\fbx\grammars-v4\fbx\Generated-CSharp-fbxsemantic\bin\Debug\net7.0\Test.dll
Build succeeded.
0 Warning(s)
0 Error(s)
Time Elapsed 00:00:05.26
bash test.sh
dos2unix: converting file ../examples/6_1_0_rock_medium_spr.fbx.errors to Unix format...
dos2unix: converting file ../examples/6_1_0_rock_medium_spr.fbx.tree to Unix format...
dos2unix: converting file ../examples/7_3_0_so_called_minimal.fbx.errors to Unix format...
dos2unix: converting file ../examples/7_3_0_so_called_minimal.fbx.tree to Unix format...
dos2unix: converting file ../examples/7_5_0_walk_test.fbx.errors to Unix format...
dos2unix: converting file ../examples/7_5_0_walk_test.fbx.tree to Unix format...
New errors in output.
6_1_0_rock_medium_spr.fbx.errors
line 6:0 mismatched input 'FBXHeaderExtension' expecting 'FBXHeaderExtension'
line 6:21 mismatched input '{' expecting ':'
7_3_0_so_called_minimal.fbx.errors
line 6:0 mismatched input 'FBXHeaderExtension' expecting 'FBXHeaderExtension'
line 6:21 mismatched input '{' expecting ':'
7_5_0_walk_test.fbx.errors
line 6:0 mismatched input 'FBXHeaderExtension' expecting 'FBXHeaderExtension'
line 6:21 mismatched input '{' expecting ':'
Test failed.
mingw32-make: *** [makefile:8: test] Error 1
04/14-06:06:06 ~/fbx/grammars-v4/fbx/Generated-CSharp-fbxsemantic
Here's the desc.xml that I used to test the grammars. Basically, it specifies the grammars can work with any target (CSharp, Cpp, Java, etc), and there are two parsers to test, one for FBX.g4, and the other for FBXSemantic.g4.
<?xml version="1.0" encoding="UTF-8" ?>
<desc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../_scripts/desc.xsd">
<targets>CSharp;Cpp;Dart;Go;Java;JavaScript;PHP;Python3;TypeScript</targets>
<grammar-files>FBX.g4;FBXSemantic.g4</grammar-files>
<test>
<name>fbx</name>
<grammar-name>FBX</grammar-name>
<entry-point>start</entry-point>
<inputs>examples</inputs>
</test>
<test>
<name>fbxsemantic</name>
<grammar-name>FBXSemantic</grammar-name>
<entry-point>start</entry-point>
<inputs>examples</inputs>
</test>
</desc>
To set a different input test for FBXSemantic.g4, create a new directory for the files, and change the <inputs> element to point to it.
There are some "symbol conflicts" in the grammar for Go and PHP. For example, "start" should be "start_".
Regarding symbol conflicts in some languages: yes, this is one of my main complaints about ANTLR. If ANTLR is going to target some language where some word is a keyword, then I think the converter for that language should be the one responsible for bumping keywords to ensure they don't collide.
Languages which are already supported might add keywords in future versions. On top of that, someone adding support for a new language shouldn't have to ask all existing authors of grammar files to rename all their things.
Or, for the atomic argument: I could make a new language where any sequence of letters is considered a keyword. Now what?
Anyway, that aside, the last time I did any major work on this, it was passing for all the current examples, so what has changed since? There were the few minor review suggestions, but none of them looked like they were going to break anything.
Not to mention, this error is nonsense:
line 6:0 mismatched input 'FBXHeaderExtension' expecting 'FBXHeaderExtension'
What's the difference between "FBXHeaderExtension" and "FBXHeaderExtension" here? They read the same to me.
FBXHeaderExtension is the very first rule in the grammar, so something very odd is going on.
OK, the tests still pass over here, warnings aside.
[INFO] Evaluating Scenario: FBX
[INFO] Parsing :path\to\grammars-v4\fbx\examples\6_1_0_rock_medium_spr.fbx
ANTLR Tool version 4.12.0 used for code generation does not match the current runtime version 4.11.1
ANTLR Runtime version 4.12.0 used for parser compilation does not match the current runtime version 4.11.1
ANTLR Tool version 4.12.0 used for code generation does not match the current runtime version 4.11.1
ANTLR Runtime version 4.12.0 used for parser compilation does not match the current runtime version 4.11.1
[INFO] Parsing :path\to\grammars-v4\fbx\examples\7_3_0_so_called_minimal.fbx
[INFO] Parsing :path\to\grammars-v4\fbx\examples\7_5_0_walk_test.fbx
[INFO] Evaluating Scenario: FBXSemantic
[INFO] Parsing :path\to\grammars-v4\fbx\examples\6_1_0_rock_medium_spr.fbx
ANTLR Tool version 4.12.0 used for code generation does not match the current runtime version 4.11.1
ANTLR Runtime version 4.12.0 used for parser compilation does not match the current runtime version 4.11.1
ANTLR Tool version 4.12.0 used for code generation does not match the current runtime version 4.11.1
ANTLR Runtime version 4.12.0 used for parser compilation does not match the current runtime version 4.11.1
[INFO] Parsing :path\to\grammars-v4\fbx\examples\7_3_0_so_called_minimal.fbx
[INFO] Parsing :path\to\grammars-v4\fbx\examples\7_5_0_walk_test.fbx
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
So somehow grammar works when using the Java target, but when using C#, fails?
@hakanai @teverett @KvanTTT Could we just merge the code as is? It won't be tested, but that's okay because the tester will need to be fixed. It has a bug with respect to multiple grammars in one directory: the generated code uses the FBXParser with the FBXSemanticLexer, which is why it wasn't working. After this PR is merged, I'll fix the tester and add it to the build in one PR later on today. And, I'll add in an issue to https://github.com/antlr/antlr4/issues for any new symbol conflicts so the fix gets implemented in the Antlr4 tool.
All the parse trees for the tree .fbx test files using grammar FBX.g4 are (start (node FBXHeaderExtension attributes)). The parser quits on the ':' character immediately following the string 'FBXHeaderExtension' (FBX.g4 does not have COLON). start is not an EOF-terminated rule so the parser is not forced to consume the entire input. I don't understand the purpose of the FBX.g4 grammar. It can only find the starting IDENTIFIER of the first node.
All the parse trees for the tree .fbx test files using grammar FBX.g4 are
(start (node FBXHeaderExtension attributes)). The parser quits on the ':' character immediately following the string 'FBXHeaderExtension' (FBX.g4 does not haveCOLON).
No, you're right, somehow FBX.g4 doesn't have COLON. Attermpting a fix, but tests fail here with unhelpful messages.
C:\Users\trejkaz\grammars-v4\fbx\examples\6_1_0_rock_medium_spr.fbx.errors (The system cannot find the file specified)
IMHO, failure to find an error file.............is success.
Try using mvn clean test.
clean test didn't seem to help. But I realised what was happening was that a parse error was being encountered and the non-existent file made it look like the error was being swallowed. I can still see the actual error, it's just printed in the console output further up. So I fixed those errors, and pushed.
So it seems, if the test passes, no issue. If the test fails, it expects to find an ".errors" file, but that file does not exist.
For FBX.g4, I tried about a dozen of the first models listed in https://free3d.com/3d-models/fbx with the grammar. I had to first convert the binary fbx files into ascii using https://github.com/BobbyAnguelov/FbxFormatConverter and the FBX SDK at https://www.autodesk.com/developer-network/platform-technologies/fbx-sdk-2020-0. Whether the ascii fbx files are valid I don't know because I don't have Autodesk.
But, the grammar works fine with a couple of changes.
- It looks like 'T' (e.g., "Shading: T") and 'R' (see below) are attribute values. I widened
attributetoattribute : value=ASTERISK? (INTEGER | DECIMAL | STRING | BOOLEAN | IDENTIFIER) ;because I wasn't sure of all the additional values. - Changed I
IDENTIFIERtoIDENTIFIER : ( 'A' .. 'Z' | 'a' .. 'z' | '0' .. '9' | '_' | '-' )+;because of attributes names with dash and underscore in a couple of the examples, which I've attached below.
Pre-Extrapolation: {
Type: R
Repetition: -1
}
Post-Extrapolation: {
Type: R
Repetition: -1
}
I also tested the FBX.g4 grammar where start is terminated with EOF, which is generally recommended because it forces Antlr to read to the end of the input file rather than stop prematurely and report success. But, there were no errors either way.