rgbds icon indicating copy to clipboard operation
rgbds copied to clipboard

[Feature request] Functions to read the contents of text and binary files

Open Rangi42 opened this issue 3 years ago • 8 comments

INCLUDE and INCBIN can include a file's contents directly as assembly code or data, but there are times when it would be nice to access a file's contents arbitrarily. For example, you might want to do something different depending on the size of the file; or reuse a tilemap with some offset added to all the bytes; or use part of a file depending on an rgbasm calculation; or so on. This currently has to be handled by including files generated with custom tools according to particular Makefile rules.

If we had arbitrary-length strings (#650) and arrays (#67), this could be accomplished with just two functions:

  • READFILE("file.txt") would return a string of the file's characters
  • READBIN("file.dat") would return an array of the file's bytes

For both functions, if the file doesn't exist they would abort with a "No such file" error, or under -M dependency-generating mode they would act consistent with INCLUDE and INCBIN after #903 gets resolved.

Note that READFILE would read the file's bytes as-is, which may not be valid UTF-8. It's already possibly to create a non-UTF-8 .asm file with arbitrary bytes in a string literal, so this is not a new issue. Some functions like STRCAT and STRIN wouldn't care about invalid UTF-8, others like STRLEN and STRSUB would print a "Invalid UTF-8 byte" error.

Rangi42 avatar Nov 11 '21 22:11 Rangi42

https://github.com/gbdev/rgbds/issues/67#issuecomment-966679756 discusses some possible syntaxes and issues with native arrays. If we ended up not adding them, another approach to reading binary file content would be to just read it as a string too, and just trust the user not to do anything with the string that expects valid UTF-8.

This would require adding a few byte-based functions (STRBYTELEN and STRBYTESUB at least) to use on "binary strings". String functions like STRCAT and STRRPL currently don't care about how a string is encoded, so they could be used on binary strings too...

...except that even they are implemented with C's own strings functions, so would behave incorrectly if the binary string has any $00 bytes. Even the ongoing PR #650 to have dynamic strings would probably still have that limitation.

(I'm also not really a fan of adding this kind of implicit limitation, where the notion of "valid string" vs "binary string" is all up to the user's understanding and has no type-level enforcement in the rgbasm language. It's also less flexible than arrays, which could store any numbers, not just bytes. But I'm bringing it up because arrays have issues of their own, and would be a large feature to add just for the same of READBIN.)

Rangi42 avatar Nov 11 '21 22:11 Rangi42

C++ std::string does support storing NUL bytes, so presumably if we used C++ std lib algorithms instead of C's string.h functions, they would work as expected for "binary strings".

#include <iostream>
#include <string>
int main() {
    std::string s("hello");
    std::cout << s.size() << " " << s << std::endl;
    s[2] = '\0';
    std::cout << s.size() << " " << s << std::endl;
    return 0;
}
5 hello
5 helo

Rangi42 avatar Nov 12 '21 16:11 Rangi42

Currently pokegold uses a custom tool to measure the size of a 5x5, 6x6, or 7x7 PNG image and encode that as $55, $66, or $77. This could replace its INCBIN "front.dimensions" lines with a simpler INCDIMENSIONS "front.2bpp" macro (named that way so scan_includes could easily be adapted to handle it):

MACRO INCDIMENSIONS
    def n = STRBYTELEN(READBIN(\1)) ; or STRLEN(READFILE(\1)) pending #938
    if n == 5 * 5 tiles
        db $55
    elif n == 6 * 6 tiles
        db $66
    elif n == 7 * 7 tiles
        db $77
    else
        fail "Invalid front pic size for \1"
    endc
ENDM

(pokecrystal can't do this because its front pics have extra animation tiles.)

Rangi42 avatar Nov 12 '21 17:11 Rangi42

READFILE could also take optional start and length arguments like INCBIN. This would be more efficient for use cases that only need part of a file, and would let pokecrystal do this (with INCDIMENSIONS "front.png" instead of INCDIMENSIONS "front.2bpp"):

MACRO INCDIMENSIONS
    pushc
    setcharmap ascii
    DEF png_ihdr_width EQUS READFILE(\1, 16, 4)
    DEF n = STRSUB("{png_ihdr_width}", 1, 1) << 24 | \
            STRSUB("{png_ihdr_width}", 2, 1) << 16 | \
            STRSUB("{png_ihdr_width}", 4, 1) <<  8 | \
            STRSUB("{png_ihdr_width}", 5, 1)
    if n == 5 * TILE_LENGTH
        db $55
    elif n == 6 * TILE_LENGTH
        db $66
    elif n == 7 * TILE_LENGTH
        db $77
    else
        fail "Invalid front pic size for \1"
    endc
    PURGE png_ihdr_width
    popc
ENDM

Rangi42 avatar Nov 12 '21 17:11 Rangi42

We could compare this proposal with WLA-DX's .FOPEN, etc directives.

.fopen "data.bin" fp
.fsize fp t
.repeat t
.fread fp d
.db d+26
.endr
.undefine t, d

Rangi42 avatar Dec 14 '23 03:12 Rangi42

I don't like exposing arbitrary file I/O because it's just complicated:

  • Try eagerly reading all of the file, and potentially OOM for large files that could've been processed piecemeal, or block infinitely from e.g. pipes;
  • Try exposing piecemeal read functions (à la WLA-DX), and shift all of the complexity onto the user, and still occasionally block infinitely.

...so for the time being, I still prefer people building ad-hoc tools and solutions when such needs arise. Please understand that I'm not trying to be a grouch raining on everyone's parade, but also reining in what I fear to be scope creep.

[!NOTE] I am however leaving this issue open so that people can still voice any support, use cases, or design suggestions.

ISSOtm avatar Dec 16 '23 16:12 ISSOtm

Try eagerly reading all of the file, and potentially OOM for large files that could've been processed piecemeal, or block infinitely from e.g. pipes;

Don't we have exactly this theoretical problem already, with INCBIN? I'd expect a READFILE function to work just like that, except putting the data in an array, not directly in the ROM.

Rangi42 avatar Dec 16 '23 16:12 Rangi42

In theory yes, but not really because INCBIN means that all of the file's contents must be dumped into the ROM anyway, and we have to allocate that space anyway for a few reasons.

ISSOtm avatar Dec 16 '23 16:12 ISSOtm