scryer-prolog icon indicating copy to clipboard operation
scryer-prolog copied to clipboard

Add `chars_stream/1` and `chars_to_stream/{2,3}` for REBIS

Open jjtolton opened this issue 7 months ago • 23 comments

Intended usage:

:- use_module(library(charsio)).
:- use_module(library(lists)).

?-   (
      Phrase="can convert string to char stream",
      length(Phrase, N),
      chars_to_stream(Phrase, Stream),
      get_n_chars(Stream, N, Chars),
      Phrase=Chars
     ).

~Currently, I have implemented this in the dumbest possible way.~ Suggestions appreciated.

jjtolton avatar May 24 '25 04:05 jjtolton

Thank you a lot for working on this!

What is the source/target of the resulting stream? Would it be more useful to make it so that the source/target is a string?

triska avatar May 24 '25 05:05 triska

Thank you a lot for working on this!

What is the source/target of the resulting stream? Would it be more useful to make it so that the source/target is a string?

Oh interesting. What did you have in mind? memory_stream/2 where the first argument is a string?

jjtolton avatar May 24 '25 05:05 jjtolton

memory_stream/2 where the first argument is a string?

Yes exactly, chars_stream(+Chars, -Stream, +Options) where we can open Chars as a stream and then read from it. It would be useful as a building block already to bootstrap existing predicates from library(charsio).

Alternatively, maybe open/3 with chars(Chars) as admissible source and even sink?

triska avatar May 24 '25 06:05 triska

Interesting. Would the 2 arrity be string and stream or stream and opts?

jjtolton avatar May 24 '25 14:05 jjtolton

Tests are extremely flakey, I assume it has something to do with the hack I used to implement the memory stream.

$ scryer-prolog -f --no-add-history src/tests/charsio.pl -f -g "use_module(library(charsio_tests)), charsio_tests:main(charsio_tests)"
Running test "can create string char stream"
Running test "can spell simple word with char stream"

thread 'main' panicked at src/arena.rs:210:5:
value contains invalid bit pattern for field ArenaHeader.tag: InvalidBitPattern { invalid_bytes: 0 }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
$ scryer-prolog -f --no-add-history src/tests/charsio.pl -f -g "use_module(library(charsio_tests)), charsio_tests:main(charsio_tests)"
Running test "can create string char stream"
Running test "can spell simple word with char stream"
Running test "can read from and write to char stream"
Running test "can convert string to char stream"
Running test "can convert string to char stream with options"

Edit: tests fine now after better implementation

jjtolton avatar May 24 '25 19:05 jjtolton

For instance, even chars_utf8bytes/2 (chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!

triska avatar May 25 '25 01:05 triska

For instance, even chars_utf8bytes/2 (chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!

Interesting, do you suppose $memory_stream/1 could opened as binary?

This line of code is very mysterious to me but I will investigate.

 let stream = Stream::from_owned_string("".to_string(), &mut self.machine_st.arena);

jjtolton avatar May 25 '25 03:05 jjtolton

Does anyone know that the self.machine_st.arena is and why it's getting passed to Stream::from_owned_string?

jjtolton avatar May 25 '25 03:05 jjtolton

For instance, even chars_utf8bytes/2 (chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!

Interesting, do you suppose $memory_stream/1 could opened as binary?

This line of code is very mysterious to me but I will investigate.

 let stream = Stream::from_owned_string("".to_string(), &mut self.machine_st.arena);

Oh, perhaps it already is in binary?

    #[inline]
    pub fn from_owned_string(string: String, arena: &mut Arena) -> Stream {
        Stream::Byte(arena_alloc!(
            StreamLayout::new(CharReader::new(ByteStream(Cursor::new(
                string.into_bytes()
            )))),
            arena
        ))
    }

jjtolton avatar May 25 '25 03:05 jjtolton

The options must be useable to specify the type (text or binary) of the resulting stream. In this way, we can access the encoded bytes directly, and also get Unicode characters from the stream when needed. Reading a term from characters could be bootstrapped from this.

triska avatar May 25 '25 03:05 triska

In the original design, what was supposed to be the source of the memory stream, i.e., where did the characters come from? (The original predicate only had a single argument, the stream.)

triska avatar May 25 '25 03:05 triska

The options must be useable to specify the type (text or binary) of the resulting stream. In this way, we can access the encoded bytes directly, and also get Unicode characters from the stream when needed. Reading a term from characters could be bootstrapped from this.

Sounds good, I will investigate in the morning

jjtolton avatar May 25 '25 03:05 jjtolton

In the original design, what was supposed to be the source of the memory stream, i.e., where did the characters come from? (The original predicate only had a single argument, the stream.)

Semantically I just assumed it would be an empty stream for writing into.

jjtolton avatar May 25 '25 03:05 jjtolton

Does anyone have any examples of byte semantics so I can write a spec to work against? i.e., what would it look like to write a byte to (or read from) a stream vs text?

jjtolton avatar May 25 '25 19:05 jjtolton

Bytes would allow having 0xFF bytes for example, which is invalid UTF-8. I guess just generate some random uniform bytes and see if they round-trip.

bakaq avatar May 25 '25 20:05 bakaq

You can use open/3 to open a file with type(binary) to test it!

triska avatar May 25 '25 23:05 triska

I settled on this as an acceptable spec:

test("can read bytes",
     (
      chars_to_stream("abc", Stream, [type(binary)]),
      get_byte(Stream, A),
      get_byte(Stream, B),
      get_byte(Stream, C),
      A=97,
      B=98,
      C=99
      )).

In studying how open/4 processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? :thinking:

jjtolton avatar May 26 '25 17:05 jjtolton

In studying how open/4 processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? :thinking:

The options appear to be parsed in Prolog code at https://github.com/mthom/scryer-prolog/blob/rebis-dev/src%2Flib%2Fbuiltins.pl#L1964

Skgland avatar May 26 '25 23:05 Skgland

In studying how open/4 processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? 🤔

The options appear to be parsed in Prolog code at https://github.com/mthom/scryer-prolog/blob/rebis-dev/src%2Flib%2Fbuiltins.pl#L1964

amazing... I thought builtins.pl was just autogenerated stub code until now!

jjtolton avatar May 27 '25 01:05 jjtolton

Ok I ripped off the parser from builtins.pl as a proof-of-concept, and managed to get the memory stream reading/writing bytes.

Is it ok that this means that the "chars" could also be bytes, depending on the options?

test("can read/write bytes",
     (
      A=97,
      B=98,
      C=99,
      chars_to_stream([A,B,C], Stream, [type(binary)]),
      get_byte(Stream, A),
      get_byte(Stream, B),
      get_byte(Stream, C),
      put_byte(Stream, A),
      put_byte(Stream, B),
      put_byte(Stream, C),
      get_byte(Stream, A),
      get_byte(Stream, B),
      get_byte(Stream, C)
      )).

or should we come up with different semantics? Because I'm not sure it's always "chars" now.

Also, if anyone has any better suggestions for the implementation of parsing the options, I'd certainly be open to that feedback.

jjtolton avatar May 28 '25 01:05 jjtolton

Ok I cleaned up the parsing code and the validation logic. The last two commits are a little outside my comfort/confidence zone in terms of approach and style, so if anyone has any suggestions I'd certainly be interested. I'll take at least one more refactoring pass to clean up white space issues and layout.

Especially the error messages, I just totally made those up.

jjtolton avatar May 30 '25 01:05 jjtolton

Updated for Rebis :partying_face:

jjtolton avatar Aug 02 '25 17:08 jjtolton

Thank you a lot for this interesting contribution!

My hope was that with this building-block, we can bootstrap chars_utf8bytes/2 by "opening" the characters (which Scryer stores in UTF-8 encoding) as type(binary), and then reading the bytes from the stream with get_byte/2. I therefore tried:

?- chars_to_stream("💜", S, [type(binary)]).
   error(type_error(integer,'💜'),must_be/2).

So, this does not work as intended, at least not yet, even though the first argument is a list of characters as the name suggests! Do you think this could be made to work?

triska avatar Aug 02 '25 19:08 triska