Add `chars_stream/1` and `chars_to_stream/{2,3}` for REBIS
Intended usage:
:- use_module(library(charsio)).
:- use_module(library(lists)).
?- (
Phrase="can convert string to char stream",
length(Phrase, N),
chars_to_stream(Phrase, Stream),
get_n_chars(Stream, N, Chars),
Phrase=Chars
).
~Currently, I have implemented this in the dumbest possible way.~ Suggestions appreciated.
Thank you a lot for working on this!
What is the source/target of the resulting stream? Would it be more useful to make it so that the source/target is a string?
Thank you a lot for working on this!
What is the source/target of the resulting stream? Would it be more useful to make it so that the source/target is a string?
Oh interesting. What did you have in mind? memory_stream/2 where the first argument is a string?
memory_stream/2where the first argument is a string?
Yes exactly, chars_stream(+Chars, -Stream, +Options) where we can open Chars as a stream and then read from it. It would be useful as a building block already to bootstrap existing predicates from library(charsio).
Alternatively, maybe open/3 with chars(Chars) as admissible source and even sink?
Interesting. Would the 2 arrity be string and stream or stream and opts?
Tests are extremely flakey, I assume it has something to do with the hack I used to implement the memory stream.
$ scryer-prolog -f --no-add-history src/tests/charsio.pl -f -g "use_module(library(charsio_tests)), charsio_tests:main(charsio_tests)"
Running test "can create string char stream"
Running test "can spell simple word with char stream"
thread 'main' panicked at src/arena.rs:210:5:
value contains invalid bit pattern for field ArenaHeader.tag: InvalidBitPattern { invalid_bytes: 0 }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
$ scryer-prolog -f --no-add-history src/tests/charsio.pl -f -g "use_module(library(charsio_tests)), charsio_tests:main(charsio_tests)"
Running test "can create string char stream"
Running test "can spell simple word with char stream"
Running test "can read from and write to char stream"
Running test "can convert string to char stream"
Running test "can convert string to char stream with options"
Edit: tests fine now after better implementation
For instance, even chars_utf8bytes/2 (chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!
For instance, even
chars_utf8bytes/2(chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!
Interesting, do you suppose $memory_stream/1 could opened as binary?
This line of code is very mysterious to me but I will investigate.
let stream = Stream::from_owned_string("".to_string(), &mut self.machine_st.arena);
Does anyone know that the self.machine_st.arena is and why it's getting passed to Stream::from_owned_string?
For instance, even
chars_utf8bytes/2(chars instantiated) should be bootstrappable with this and should even become much faster: open the string as binary, read it as bytes!Interesting, do you suppose
$memory_stream/1could opened as binary?This line of code is very mysterious to me but I will investigate.
let stream = Stream::from_owned_string("".to_string(), &mut self.machine_st.arena);
Oh, perhaps it already is in binary?
#[inline]
pub fn from_owned_string(string: String, arena: &mut Arena) -> Stream {
Stream::Byte(arena_alloc!(
StreamLayout::new(CharReader::new(ByteStream(Cursor::new(
string.into_bytes()
)))),
arena
))
}
The options must be useable to specify the type (text or binary) of the resulting stream. In this way, we can access the encoded bytes directly, and also get Unicode characters from the stream when needed. Reading a term from characters could be bootstrapped from this.
In the original design, what was supposed to be the source of the memory stream, i.e., where did the characters come from? (The original predicate only had a single argument, the stream.)
The options must be useable to specify the type (text or binary) of the resulting stream. In this way, we can access the encoded bytes directly, and also get Unicode characters from the stream when needed. Reading a term from characters could be bootstrapped from this.
Sounds good, I will investigate in the morning
In the original design, what was supposed to be the source of the memory stream, i.e., where did the characters come from? (The original predicate only had a single argument, the stream.)
Semantically I just assumed it would be an empty stream for writing into.
Does anyone have any examples of byte semantics so I can write a spec to work against? i.e., what would it look like to write a byte to (or read from) a stream vs text?
Bytes would allow having 0xFF bytes for example, which is invalid UTF-8. I guess just generate some random uniform bytes and see if they round-trip.
You can use open/3 to open a file with type(binary) to test it!
I settled on this as an acceptable spec:
test("can read bytes",
(
chars_to_stream("abc", Stream, [type(binary)]),
get_byte(Stream, A),
get_byte(Stream, B),
get_byte(Stream, C),
A=97,
B=98,
C=99
)).
In studying how open/4 processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? :thinking:
In studying how
open/4processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? :thinking:
The options appear to be parsed in Prolog code at https://github.com/mthom/scryer-prolog/blob/rebis-dev/src%2Flib%2Fbuiltins.pl#L1964
In studying how
open/4processes stream options, it's really not clear to me how the values make it from the options to here. There must be some preprocessing step in prolog I missing, I can't find it in the rust code. Maybe in loader? 🤔The options appear to be parsed in Prolog code at https://github.com/mthom/scryer-prolog/blob/rebis-dev/src%2Flib%2Fbuiltins.pl#L1964
amazing... I thought builtins.pl was just autogenerated stub code until now!
Ok I ripped off the parser from builtins.pl as a proof-of-concept, and managed to get the memory stream reading/writing bytes.
Is it ok that this means that the "chars" could also be bytes, depending on the options?
test("can read/write bytes",
(
A=97,
B=98,
C=99,
chars_to_stream([A,B,C], Stream, [type(binary)]),
get_byte(Stream, A),
get_byte(Stream, B),
get_byte(Stream, C),
put_byte(Stream, A),
put_byte(Stream, B),
put_byte(Stream, C),
get_byte(Stream, A),
get_byte(Stream, B),
get_byte(Stream, C)
)).
or should we come up with different semantics? Because I'm not sure it's always "chars" now.
Also, if anyone has any better suggestions for the implementation of parsing the options, I'd certainly be open to that feedback.
Ok I cleaned up the parsing code and the validation logic. The last two commits are a little outside my comfort/confidence zone in terms of approach and style, so if anyone has any suggestions I'd certainly be interested. I'll take at least one more refactoring pass to clean up white space issues and layout.
Especially the error messages, I just totally made those up.
Updated for Rebis :partying_face:
Thank you a lot for this interesting contribution!
My hope was that with this building-block, we can bootstrap chars_utf8bytes/2 by "opening" the characters (which Scryer stores in UTF-8 encoding) as type(binary), and then reading the bytes from the stream with get_byte/2. I therefore tried:
?- chars_to_stream("💜", S, [type(binary)]).
error(type_error(integer,'💜'),must_be/2).
So, this does not work as intended, at least not yet, even though the first argument is a list of characters as the name suggests! Do you think this could be made to work?