STklos icon indicating copy to clipboard operation
STklos copied to clipboard

Importing a library changes read syntax

Open lassik opened this issue 1 year ago • 14 comments

What causes the following?

#u16(...) syntax does not work:

stklos> #u16(1 2 3)
**** Error:
%read: bad uniform vector specification `u16'
	(type ",help" for more information)
stklos> **** Error:
%execute: bad function `1'. Cannot be applied
	(type ",help" for more information)

Import the right SRFI, and it does:

stklos> (import (srfi 4))
stklos> #u16(1 2 3)
#u16(1 2 3)

As far as I know, read syntax is only supposed to be changed by #! directives (e.g. #!r6rs or #!fold-case). Imports do not normally change it.

lassik avatar Jan 01 '23 18:01 lassik

Well, #u16(...) only makes sense when SRFI-4 is loaded... That's why it works this way. As far as I know the standard doesn't forbid it (at least R7RS doesn't - or did I miss this?)

jpellegrini avatar Jan 01 '23 18:01 jpellegrini

The idea (not sure that it is a good one) is to have a better message when using bad sharp syntaxes.

We have:

stklos> #s16
**** Error:
%read: bad sharp syntax in `"#s16"'
stklos> (import (srfi 4)) 
stklos> #s16
**** Error:
%read: bad uniform vector specification `s16'

Furthermore, defining a constant without the primitives to access its content is probably not very helpful. Anyway, changing that point is easy. Do you see any drawback to the current implementation?

egallesio avatar Jan 01 '23 20:01 egallesio

BTW @lassik, what are the bugs you have seen with the current implementation?

egallesio avatar Jan 01 '23 20:01 egallesio

BTW @lassik, what are the bugs you have seen with the current implementation?

The first bug is the above:

stklos> #s8(1 2 3)
**** Error:
%read: bad sharp syntax in `"#s8"'
	(type ",help" for more information)
stklos> **** Error:
%execute: bad function `1'. Cannot be applied
	(type ",help" for more information)

Since the reader does not recognize the #s8, it skips it, and then reads (1 2 3) and tries to evaluate it. The evaluation fails. An unrecognized #foo token should stop it from reading more stuff.

The second bug is:

stklos> #s16"abc"
#u8(97 98 99)

Any numeric vector prefix can be used to read the #u8"..." bytevector syntax.

lassik avatar Jan 01 '23 22:01 lassik

An unrecognized #foo token should stop it from reading more stuff.

This may require special handling when reading from a terminal (as opposed to a file)?

lassik avatar Jan 01 '23 22:01 lassik

The second bug is:

I think the second bug doesn't happen anymore - am I wrong?

stklos> #s16"abc"
**** Error:
%read: bad sharp syntax in `"#s16"'
	(type ",help" for more information)
"abc"

jpellegrini avatar Mar 06 '23 10:03 jpellegrini

And I'm not sure it's possible to fix the second one... The reader sees #s8 and (1 2 3) as two separate tokens, and complains about the first. It keeps reading...

"a"x"b"       ;;  with x unbound

will, in most schemes(*), print "a", then trigger an exception and print the error message, and then print "b"... So errors like that won't make the reader stop -- the behavior of STklos for sharp syntax seems OK, I guess.

(*) Kawa stops at the error. The Chez REPL expects the user to hit enter for each expression entered, so it's totally different. The others I tried behave as I mentioned

Also, I see that only STklos and Chicken implement SRFI 207, and Chicken does not implement the sharp syntax part.

jpellegrini avatar Mar 06 '23 11:03 jpellegrini

The reader sees #s8 and (1 2 3) as two separate tokens, and complains about the first. It keeps reading...

You could first read all available input from the terminal into a buffer. Then (read) from that buffer, intead of (read) directly from the terminal. This should be doable by using a read with a timeout, either using poll() or a non-blocking terminal fd.

lassik avatar Mar 06 '23 14:03 lassik

I.e. something like this pseudo-code:

while (terminal_fd_is_not_closed) {
    buffer_clear();
    while (poll(terminal_fd)) {
        char input_char[1];
        read(terminal_fd, input_char, 1);
        buffer_putc(input_char);
    }
    try {
        input_port = make_string_input_port(buffer);
        while (datum = read_scheme_datum(input_port)) {
            eval(datum);
        }
    } except {
        display_error();
    } finally {
        close(input_port);
    }
}

lassik avatar Mar 06 '23 14:03 lassik

But the same code that reads from a terminal reads from a string. Maybe it's possible to make STklos ignore what happens post-error in a whole line, I guess. A read error would make it go all the way out to the point where it processed the whole line. Not sure.

jpellegrini avatar Mar 06 '23 14:03 jpellegrini

See that partial expressions may be passed in a line (a string):

stklos> (define x  ;; the reader knows this newline did not end the expression!
1)                 ;; but this one did!

I don't remember how STklos deals with this, but you can take a look at src/read.c. Maybe @egallesio can help?

jpellegrini avatar Mar 06 '23 14:03 jpellegrini

Reading until end-of-line is not the best way to do it. If you paste 50 lines of code into a terminal, and the 5th line raises an error, then the interpreter will continue to process the next 45 lines. It should not do that.

It's better to read with a timeout. IIRC the conventional timeout for reading from a TTY is something like 50 milliseconds. If you paste a lot of text into a terminal window, the terminal emulator will send it all instantly to the program running in the terminal. So the program will receive all the pasted text, and no more, if you use a reasonable timeout.

lassik avatar Mar 06 '23 15:03 lassik

In other words:

  • Buffer all the text that comes within the given timeout - whether it's one line or 500 lines.
  • Read and eval the text in the buffer.
  • If any of it raises an error, throw away the rest of the buffer without reading or evaluating it.
  • Clear the buffer and repeat until the end of file.

lassik avatar Mar 06 '23 15:03 lassik

It's probably good to add some extra logic to that, so that a partial line is never read and evaluated, even if it comes within the timeout.

lassik avatar Mar 06 '23 15:03 lassik