packcc icon indicating copy to clipboard operation
packcc copied to clipboard

Import support

Open ccleve opened this issue 3 years ago • 10 comments

Antlr supports "import" statements, where you can import an external file and treat its contents as if they were part of the current file. I use this a lot to create multiple parsers with different capabilities that share common functionality.

See https://github.com/antlr/antlr4/blob/master/doc/grammars.md#grammar-imports

Does Packcc have similar functionality?

ccleve avatar Oct 13 '21 17:10 ccleve

PackCC does not have a functionality like "import". Thanks for the suggestion.

I understand its usability, but currently I'm reluctant to realize it since it brings some complexity in PackCC's simpleness. Let me only add this issue to the want list for the time being. If there are strong requests by several users, I'll begin to implement it.

arithy avatar Oct 16 '21 02:10 arithy

I believe import functionality should be quite easily implemented by custom PCC_GETCHAR. It would just have to check if the next few characers is an import statement and start to read another file. At the same time, it would have to keep track of the "stack" in *auxil structure, so it knows to which file to return when the current one is parsed.

I don't think support for this should be directly in PackCC, as it only works with stream of characters. It doesn't know anything about files. At best, there could be some function to help with the stack keeping and/or some examples how to implement this.

dolik-rce avatar Oct 16 '21 07:10 dolik-rce

@dolik-rce This isn't a suggestion to import data being parsed. It's a suggestion to import specs into the parser itself while the parser is being generated. For example, imagine that we're parsing a query language that must handle standard keywords like AND, OR, and NOT, but must also have special handling for different languages like French or Arabic. In that case it would be really helpful to create a separate .peg file for every language, but then import a standard, common .peg file that recognizes keywords. It would save a lot of copying and pasting.

ccleve avatar Oct 16 '21 19:10 ccleve

@ccleve Oh, I see. My bad, I guess I wasn't paying enough attention when I read the issue :roll_eyes:

dolik-rce avatar Oct 17 '21 17:10 dolik-rce

Please please implement this. I'm writing a parser for a language that is case-insensitive and needs UCD categories. I've generated both the categories and permutations for keywords but this creates an unwieldy grammar file that's more than 15K LOC. (I'm uncertain how packcc is going to handle this but we'll see....) Also it might be a good idea to add a way of telling ackcc "Hey this string literal should be matched case-insensitively" because I don't like generating thousands of word permutations even if it is fast.)

ethindp avatar Apr 01 '24 17:04 ethindp

Also it might be a good idea to add a way of telling ackcc "Hey this string literal should be matched case-insensitively" because I don't like generating thousands of word permutations even if it is fast.)

@ethindp: This is slightly off-topic, but do you know, you can use character classes to match keywords case-insensitively? E.g.: [kK] [eE] [yY] [wW] [oO] [rR] [dD] will match "keyword", "Keyword", "KEYWORD" as well as "kEyWoRd" and all other weird permutations.

dolik-rce avatar Apr 01 '24 18:04 dolik-rce

I... Stupidly didn't think about that, thank you for the reminder!

ethindp avatar Apr 01 '24 20:04 ethindp

@ccleve , @ethindp , I have introduced the import functionality. Please check it.

%import "import file name"

The content of the specified import file is expanded at the text location of %import (version 2.0.0 or later). This can be used multiple times anywhere and can be used also in imported files. The import file name can be a relative path to the current directory or an absolute path. If it is a relative path, the directories listed below are searched for the import file in the listed order.

  1. the directory where the file that imports the import file is located
  2. the directories specified with -I options
    • They are prioritized in order of their appearance in the command line.
  3. the directories specified by the environment variable PCC_IMPORT_PATH
    • They are prioritized in order of their appearance in the value of this variable.
    • The character used as a delimiter between directory names is the colon ':' if PackCC is built for a Unix-like platform such as Linux, macOS, and MinGW. The character is the semicolon ';' if PackCC is built as a native Windows executable. (This is exactly the same manner as the environment variable PATH.)
  4. the per-user default directory
    • This is the subdirectory .packcc/import in the home directory if PackCC is built for a Unix-like platform, and in the user profile directory, "C:\Users\username" for example, if PackCC is built as a native Windows executable.
  5. the system-wide default directory
    • This is the directory /usr/share/packcc/import if PackCC is built for a Unix-like platform, and is the subdirectory packcc/import in the common application data directory, "C:\ProgramData" for example.

Note that the file imported once is silently ignored when it is attempted to be imported again.

arithy avatar Apr 21 '24 14:04 arithy

I have just quickly tested the imports. It's very intuitive and seems to work very well. If I understand correctly, there is slight difference in the behavior: Unused rules in the imported files are ignored, while in the main parsed files they result in error.

This is totally understandable and good if you wish to use a prepared library (as is the case with the bundled ascii and unicode classes), but it might be surprising if someone uses import just to break single grammar to multiple smaller files for better readability. It should probably be documented in the README.

It might also make sense to let user choose if he wants to check for unused rules in imported files (e.g. by using different directive) or not. Not sure if that is possible to implement easily. I didn't study the new code enough to understand it yet :slightly_smiling_face:

EDIT: Oh, now I see. The warning for unused rules has been removed in 7b4aa2501dce2da1305e98228b86864d893b243e for all rules, nut just those from imported files. That is a bit surprising, I liked that feature - it usually made me realized that I did some stupid error in my grammar :slightly_smiling_face:

dolik-rce avatar Apr 21 '24 17:04 dolik-rce

Thank you for your immediate feedback!

arithy avatar Apr 21 '24 23:04 arithy