BFO-2020 icon indicating copy to clipboard operation
BFO-2020 copied to clipboard

Use of the '@' symbol in Common Logic files causes Hets to throw a parsing error

Open dillerm opened this issue 1 year ago • 6 comments

The cl:comment in the first few lines of each of these files contains an email address, which of course uses an '@' symbol. Whenever I try to load this file in the online Hets toolkit (rest.hets.eu), I get the following error message: "unexpected '@' / expecting ' ' '. [i.e., expecting a single quote]" (comment in brackets is my own).

I'm still not absolutely certain why this is an issue, but looking at the CLIF specification I noticed that the '@' symbol is not listed in Section A.2.2.4 under the characters that can be used to form lexical tokens (see attached). Because (1) this email address is part of a quoted string, (2) quoted strings are considered lexical tokens in CLIF, and (3) lexical tokens can only contain members of the sets of characters, delimiters, or whitespace that are defined in the specification, I believe this is why Hets is throwing this error.

Solution: Replacing the '@' symbol with '(at)' or something along those lines fixes this. Please note that, to my knowledge, you unfortunately cannot escape it with a backslash because the backslash is only reserved for special uses in CLIF, which is to escape single or double quotes within quoted strings.

Screen Shot 2023-03-27 at 6 42 47 PM

dillerm avatar Mar 29 '23 20:03 dillerm

This looks like a spec bug. It says: "This includes all the alphanumeric characters", but then that disagrees with the production. Who wins? It can't be an intentional omission.

alanruttenberg avatar Mar 29 '23 23:03 alanruttenberg

Not that it's a better option, but you can use any Unicode by escaping with \u or \U. Has HETS been updated for the 2018 Common Logic spec? If not there might be other problems. cl:outdiscourse is defined in 2018 but not 2007. Looks like cl:ttl is also new.

alanruttenberg avatar Mar 29 '23 23:03 alanruttenberg

I changed my source to use (at) in the future. If you want to submit a PR fixing the current files, that's welcome. Otherwise I'll get to it at some point.

alanruttenberg avatar Mar 29 '23 23:03 alanruttenberg

It was pointed out to me that @ isn't an alphanumeric character. But the sentence starts "char is all the remaining ASCII non-control characters", so that includes @

alanruttenberg avatar Mar 30 '23 00:03 alanruttenberg

@alanruttenberg , yeah, I find it very bizarre as well and thought it might be have been omitted by mistake. I might reach out to the Hets folks to see if this is a feature or a bug on either their end or the spec's. I can also make the pull request tomorrow.

dillerm avatar Mar 30 '23 01:03 dillerm

Oh, it looks like I misread the spec and \u, like you said, can be used to escape any Unicode.

dillerm avatar Mar 30 '23 01:03 dillerm