scryer-prolog icon indicating copy to clipboard operation
scryer-prolog copied to clipboard

¤ cannot be used

Open triska opened this issue 2 years ago • 9 comments

Currently, I get:

?- X = ¤.
   error(syntax_error(unexpected_char),read_term/3:0).
?- X = '¤'.
   error(syntax_error(invalid_single_quoted_character),read_term/3).
?- X = "¤".
   error(syntax_error(missing_quote),read_term/3:0).

Is there anything special about this character? Why cannot it be used like other symbols/currency characters such as $?

For comparison, I get with GNU Prolog:

| ?- X = '¤'.

X = '¤'

yes

triska avatar Jun 19 '22 08:06 triska

On the other hand, the toplevel reports the character when asked for it in a different way:

?- char_code(C, 164).
   C = '¤'.

This means that such answer substitutions currently cannot be pasted back as queries.

triska avatar Jun 26 '22 08:06 triska

We need to identify that class of characters which can be written directly with single or double quotes.

non quote char (* 6.4.2.1 *)
   = graphic char (* 6.5.1 *)
   | alphanumeric char (* 6.5.2 *)
   | solo char (* 6.5.3 *)
   | space char (* 6.5.4 *)
   | meta escape sequence (* 6.4.2.1 *)
   | control escape sequence (* 6.4.2.1 *)
   | octal escape sequence (* 6.4.2.1 *)
   | hexadecimal escape sequence (* 6.4.2.1 *) ;

which is what is used to define those characters that can appear in such quoted context. So extended characters (6.5) that look like characters may be added to non quote char but neither to one of those defined above like graphic char or alphanumeric char.

UWN avatar Aug 23 '22 06:08 UWN

You can also extend graphic char and alphanumeric char. Just read the second paragraph of extended characters (6.5):

Unbenannt

Its say the contrary to what you wrote, when you wrote "but neither to one of those defined above like graphic char or alphanumeric char"

If you use Unicode database, the character is classified as CURRENCY_SYMBOL:

Unbenannt2

Jean-Luc-Picard-2021 avatar Aug 23 '22 16:08 Jean-Luc-Picard-2021

SWI Prolog and Dogelog Player put CURRENCY_SYMBOL in the category graphic char. A typical currency symbol is $ which behaves like this:

?- X = $$$ .
X = $$$ .

?- X = $$$6 .
ERROR: Syntax error: Operator expected

?- X = '$$$6' .
X = '$$$6'.

Now ¤ is also a currency symbol just like $, so it could behave the same. Thats why both Prolog systems behave as follows:

?- X = ¤¤¤ .
X = ¤¤¤ .

?- X = ¤¤¤6 .
ERROR: Syntax error: Operator expected

?- X = '¤¤¤6' .
X = '¤¤¤6'.

So when you put it into the category graphic char, there is no need to use quotes around it, if its only a sequence of graphic characters.

Interestingly Trealla has no problem reading it, unlike Scryer Prolog, but it seems to me there is also a little glitch in the writing:

https://github.com/trealla-prolog/trealla/issues/26

Jean-Luc-Picard-2021 avatar Aug 23 '22 16:08 Jean-Luc-Picard-2021

¬ has the same problem (see also #1591):

?- char_code(C, 172).
   C = '¬'.
?- C = '¬'.
   error(syntax_error(invalid_single_quoted_character),read_term/3).

triska avatar Aug 23 '22 20:08 triska

Similar problem but not exactly the same solution, in case Scryer Prolog would use some Unicode database,

you would need to map a different Unicode general category. You can check SWI-Prolog:

?- unicode_property(¬, category(X)).
X = 'Sm'.

?- unicode_property(¤, category(X)).
X = 'Sc'.

Jean-Luc-Picard-2021 avatar Aug 24 '22 00:08 Jean-Luc-Picard-2021

You can also extend graphic char and alphanumeric char.

It is a possibility to do so. And for alphanumeric chars Scryer already does this. The question here is rather whether or not it makes sense to extend graphic char which may make the source code much less readable and reliable. Libraries like TPTP have refrained from this, and this although it was suggested in related discussions.

The other question is which non-terminals may be extended. In 6.5 graphic char, alphanumeric char, solo char, layout char and meta char are mentioned. From NOTE 2 it becomes evident that also small letter char and capital letter char can be extended separately. So far in the standard there is no character that can be used in a quoted context but not in another context. But here the mentioned non quote char may be a safer choice for extension. This would be also a bit closer to the way other programming languages like C (WG14 N1518) do it.

Things would get even more complex, when also solo char is extended. So some new characters would be graphic and others solo. Sticking to a more conservative extension seems preferable. Such a more conservative extension does not rule out the use of such symbols but requires that quotes are used to make them better visible.

UWN avatar Aug 24 '22 06:08 UWN

TPTP (as of v8.1.0.0) is not a Unicode adaptation, it doesn't make use of the unicode database.

For example it has:

<lower_alpha>          ::: [a-z]

https://www.tptp.org/TPTP/SyntaxBNF.html#lower_alpha

But in Unicode one has, for the range 0-255 (Basic Latin & Latin-1 Supplement Block):

<lower_alpha>          ::: [abcdefghijklmnopqrstuvwxyzµßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]

Hope this helps!

There is no TPTP-Unicode yet. But there are many Prolog-Unicode already!

Jean-Luc-Picard-2021 avatar Aug 24 '22 12:08 Jean-Luc-Picard-2021

If you would like to be TPTP compatible (versus v8.1.0.0), you would need to ban this:

$ target/release/scryer-prolog -v
"v0.9.0-181-g8e9302ea"
$ target/release/scryer-prolog
?- X = hörgerät.
   X = hörgerät.

So Scryer Prolog is now somewhere between TPTP and a Prolog Unicode, neither fish nor fowl, I cannot parse this, doesn't work in Scryer Prolog:

$ cat text.pl
text("«The Logos of Cybele is the idea that the Great Mother creates\n\
and kills everything. It is not eternity (Apollo) or the circle\n\
(Dionysus), but something that acts in her way with blind\n\
and absolute power. A form of progress: bottom-up growth.\n\
We are experiencing the final attack of Cybele, of the Great Risen\n\
Mother, with feminism, artificial intelligence, globalization,\n\
democracy, liberalism, and so on»").

I get this error:

$ target/release/scryer-prolog
?- ['text.pl'].
   error(syntax_error(missing_quote),read_term/3:0).
   false.

Works fine in SWI-Prolog:

?- text(X), write(X), nl, fail; true.
«The Logos of Cybele is the idea that the Great Mother creates
and kills everything. It is not eternity (Apollo) or the circle
(Dionysus), but something that acts in her way with blind
and absolute power. A form of progress: bottom-up growth.
We are experiencing the final attack of Cybele, of the Great Risen
Mother, with feminism, artificial intelligence, globalization,
democracy, liberalism, and so on»
true.

Jean-Luc-Picard-2021 avatar Aug 24 '22 12:08 Jean-Luc-Picard-2021

This works perfectly now, thank you a lot!

?- X = ¤.
   X = ¤.

triska avatar Mar 26 '23 08:03 triska

I am reopening this because I think it is expected (https://github.com/mthom/scryer-prolog/issues/1749#issuecomment-1484061231) that the symbol cannot be part of a letter token:

?- X = ¤a.
   X = ¤a, unexpected.

triska avatar Mar 26 '23 16:03 triska

Now it seems to work really perfectly, thank you a lot!

triska avatar Mar 26 '23 19:03 triska