scryer-prolog icon indicating copy to clipboard operation
scryer-prolog copied to clipboard

must_be/2 to support more than types (?)

Open UWN opened this issue 3 years ago • 7 comments

Currently must_be/2 supports some types of 7.12.2 b and some informal as chars. Further candidates would be those in 8.1.2.1 and in general domain_errors of 7.12.2 c. This would help to make errors more uniform in particular the different reporting for list and character for chars and the like.

The following have occurred so far:

in_character (type error)

not_less_than_zero (type_error(integer, I) and domain_error)

UWN avatar Feb 27 '22 16:02 UWN

Another example that currently occurs in library(crypto) and library(charsio) is:

byte_char

I use it to denote a character whose code is in 0..255. It is like char, except that it raises a domain error if the code of the character is greater than 255. This is useful when using strings to compactly represent octet sequences in memory. The internal predicate '$first_non_octet'/2 can be used to efficiently locate the first "non-octet" in strings. Maybe this could be a potential candidate for inclusion in library(error)? For example, as:

must_be(single_octet_chars, Cs)

triska avatar Mar 27 '22 19:03 triska

How are lists of single octet characters represented in memory? If chars is utf8, then any char value between 128-255 would be represented with two bytes. Is there a special-cased octet-list representation (u8 vec) akin to the char-list representation (utf8 string I assume)?

infogulch avatar Mar 27 '22 22:03 infogulch

@infogulch: The internal representation is UTF-8, so indeed the characters with codes in 128-255 are represented by 2 bytes each!

triska avatar Mar 27 '22 22:03 triska

Being 'slightly inefficient' (1.5 bytes per 'octet char' on average?) isn't much of an issue for general byte manipulation, especially compared to other representations (24+ bytes per element, oof). But for cryptography in particular, I'm concerned that using a nonlinear representation could expose the plaintext and intermediates to side channel attacks, maybe leaking one bit per octet (the high bit). Has this potential issue been considered already?

infogulch avatar Mar 27 '22 22:03 infogulch

When encrypting binary data by using the encoding(octet) option of library(crypto), the characters are first transformed to actual bytes (u8), all in the range 0..255:

https://github.com/mthom/scryer-prolog/blob/c45cdd6ea0c4f94269839d157bf2221c640f9b12/src/machine/system_calls.rs#L5995

triska avatar Mar 27 '22 23:03 triska

It seems this issue went a little bit into some side track. Any other types?

UWN avatar Apr 09 '22 17:04 UWN

not_less_than_zero is made available as part of #1593!

triska avatar Aug 24 '22 19:08 triska