Mathics Named characters don't really work

After finishing the work at https://github.com/mathics/Mathics/pull/1077 and https://github.com/Mathics3/mathicsscript/pull/9 I did a little test in mathicsscript:

Mathicscript: 1.1.2, Mathics 2.0.0dev
on CPython 3.6.9 (default, Oct  8 2020, 12:12:24)
using SymPy 1.7.1, mpmath 1.1.0

Copyright (C) 2011-2020 The Mathics Team.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions.
See the documentation for the full license.

Quit by pressing CONTROL-D

In[1]:= \[Theta] = 4
Out[1]= θ == 4

In[2]:= a = b
Out[2]= a == b

In[3]:= "\[DifferentialD] is the differential"
Out[3]=  is the differential

There are a couple of issues in here. First of all, it looks like = is replaced by  in mathicsscript. It looks like this is a mathicsscript-specific issue, but I haven't updated Mathics-Django since https://github.com/mathics/Mathics/pull/1077 was merged and I got the following when I tried updating it now:

python3 setup.py install
/home/pablo/.local/lib/python3.6/site-packages/setuptools/dist.py:452: UserWarning: Normalizing '2.0.0dev' to '2.0.0.dev0'
  warnings.warn(tmpl.format(**locals()))
running install
running bdist_egg
running egg_info
writing Mathics3.egg-info/PKG-INFO
writing dependency_links to Mathics3.egg-info/dependency_links.txt
writing entry points to Mathics3.egg-info/entry_points.txt
writing requirements to Mathics3.egg-info/requires.txt
writing top-level names to Mathics3.egg-info/top_level.txt
error: package directory 'mathics/algorithm' does not exist
Makefile:42: recipe for target 'install' failed

Anyway, this is probably a silly mistake in mathicsscript and it's probably my fault. I'll investigate this.

What's more concerning is the fact that event thought mathicsscript correctly convert the named characters to their corresponding unicode representations, the kernel doesn't seem to know what they are and they aren't properly display. According to Wolfram's listing of named character:

The Wolfram System provides systemwide support for a large number of special characters. Each character has a name and a number of shortcut aliases. They are fully supported by the standard Wolfram System fonts. For further information about named characters, including character interpretations and naming conventions, please see "Named Characters".

As far as I can tell, named characters should be valid identifiers and should be valid characters inside a string.

Jan 11 '21 21:01 GarkGarcia

I've now confirmed this wasn't mathicsscript-specific. This was an error in replace_unicode_with_wl, and it should be fixed via https://github.com/mathics/Mathics/pull/1108. The only reason why it didn't show up in mathicsserver is because mathicsserver doesn't use replace_unicode_with_wl.

Jan 11 '21 22:01 GarkGarcia

I should also point out that there is some unsoundness going on with warnings. Essentially, named characters aren't escaped in warning messages:

Mathicscript: 1.1.2, Mathics 2.0.0dev
on CPython 3.6.9 (default, Oct  8 2020, 12:12:24)
using SymPy 1.7.1, mpmath 1.1.0

Copyright (C) 2011-2020 The Mathics Team.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions.
See the documentation for the full license.

Quit by pressing CONTROL-D

In[1]:= \[DifferentialD] := 4
Syntax::sntxb: Expression cannot begin with " := 4" (line 1 of "<stdin>").

As far as I understand \[DifferentialD] should be a valid identifier, but even if we ignore that the output should be:

Syntax::sntxb: Expression cannot begin with "𝐷 := 4" (line 1 of "<stdin>").

or

Syntax::sntxb: Expression cannot begin with "\[DifferentialD] := 4" (line 1 of "<stdin>").

This shows us two things: named characters should be added to the list of "valid characters in an identifier" (this is something that should be done in the kernel) and we need a function that maps named characters to their fully qualified names (i.e. maps \[DifferentialD] or U+F74C to the Python string '\\[DifferentialD]'). I'll detail the reasons for the latter in the following paragraph.

Named characters need to be properly display by our clients inside error messages and strings. We can't restrict ourselves to named characters that have a unicode equivalent because:

The client may not support unicode.
We still need to display named characters that don't have a unicode equivalent (they may appear in the output of an expression).

The point is: we need to display named characters in a way our users can understand them even if unicode isn't around, and the only sensible way to display them in this case is by their fully qualified name (i.e. display \[DifferentialD] with the Python string '\\[DifferentialD]'.

Jan 11 '21 22:01 GarkGarcia

@rocky @mmatera I can take care of all of this, but I'd like to know if you guys have any thoughts or concerns.

Jan 11 '21 22:01 GarkGarcia

@GarkGarcia no comments. What you report is fine.

I am glad you were able to figure out your problem on your own.

Jan 12 '21 16:01 rocky

I should also point out that there is some unsoundness going on with warnings. Essentially, named characters aren't escaped in warning messages:
Mathicscript: 1.1.2, Mathics 2.0.0dev
on CPython 3.6.9 (default, Oct  8 2020, 12:12:24)
using SymPy 1.7.1, mpmath 1.1.0

Copyright (C) 2011-2020 The Mathics Team.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions.
See the documentation for the full license.

Quit by pressing CONTROL-D

In[1]:= \[DifferentialD] := 4
Syntax::sntxb: Expression cannot begin with " := 4" (line 1 of "<stdin>").
As far as I understand \[DifferentialD] should be a valid identifier, but even if we ignore that the output should be: \[DifferentialD] is an operator (see https://reference.wolfram.com/language/ref/character/DifferentialD.html ). For this reason, it is not a good identifier.

Regarding how the messages are shown, in WMA, the front-end can redefine the symbol Message in order to convert the characters before they are sent to the client, and it is what I use in IWolfram to handle the warning messages. In mathics, we do not have implemented this, but it shouldn't be difficult to do: we just need to modify evaluation.message. However, maybe it would be better to just modify the messages in the routine in the front-end that shows them (I do not have present where it this, and I couldn't figure out by a quick search, but it shouldn't be difficult to find it). There, you can replace those characters that have a standard unicode equivalent by their equivalences, and those that have not with the NameCharacter expression.

Syntax::sntxb: Expression cannot begin with "𝐷 := 4" (line 1 of "<stdin>").
or
Syntax::sntxb: Expression cannot begin with "\[DifferentialD] := 4" (line 1 of "<stdin>").
This shows us two things: named characters should be added to the list of "valid characters in an identifier" (this is something that should be done in the kernel) and we need a function that maps named characters to their fully qualified names (i.e. maps \[DifferentialD] or U+F74C to the Python string '\\[DifferentialD]'). I'll detail the reasons for the latter in the following paragraph.

Named characters need to be properly display by our clients inside error messages and strings. We can't restrict ourselves to named characters that have a unicode equivalent because:
1. The client may not support unicode.

2. We still need to display named characters that don't have a unicode equivalent (they may appear in the output of an expression).

Indeed, I think it is a good idea to have a function to map all the non-standard characters to their name

The point is: we need to display named characters in a way our users can understand them even if unicode isn't around, and the only sensible way to display them in this case is by their fully qualified name (i.e. display \[DifferentialD] with the Python string '\\[DifferentialD]'.

Jan 13 '21 10:01 mmatera

However, maybe it would be better to just modify the messages in the routine in the front-end that shows them (I do not have present where it this, and I couldn't figure out by a quick search, but it shouldn't be difficult to find it). There, you can replace those characters that have a standard unicode equivalent by their equivalences, and those that have not with the NameCharacter expression.

Makes sense. I'm working on a patch for mathicsscript that takes care of this as you've described and I plan on releasing it after we're done with https://github.com/mathics/Mathics/pull/1109. I plan on doing the same for mathicsserver.

Jan 13 '21 15:01 GarkGarcia

Try this: replace the lines 155-157 in https://github.com/Mathics3/mathics-django/blob/dfe9aeeecb31552f1c7f488547929658261dd928/mathics_django/web/views.py#L155

result = { "results": [result.get_data() for result in results], } by

    outputs = []
    for result in results:
        data = result.get_data()
        for msg in data['out']:
            msg['text'] = replace_wl_to_unicode(msg['text'])
        outputs.append(data)
        
    result = {
        "results": outputs,
    }

This replaces the characters also in Print and Message strings:

imagen

Jan 14 '21 12:01 mmatera

Mathics Mathics copied to clipboard

Named characters don't really work

Mathics
Mathics copied to clipboard