tact
tact copied to clipboard
Add support for non-UTF8 binary-like strings (or add format for working with binary data)
What kind of features would like to see supported in Tact?
The problem
Right now there is no way to store any binary data in tact, because the language represents any string as UTF-8. For example, I want to store small JPEG image (few kilobates) inside smart-contract (I want to HARDCODE it) and then return it inside get_nft_data contract getter (in other words, hardcode onchain NFT data). Yes, I understand that i may store it as a cell variable and then set it in initialization, but its not as safe as using constant(BTW needs more deployment code).
Example
Here is my smart contract, which stores on-chain NFT metadata, and returns it in getter; example of binary data (but indeed tact stores it as UTF-8 ). But unfortunately, I'm not able to hardcode it using tact - the only way to build a "string cell", is to use func with tact wrappers (something like slice __gen_binary_string() asm "B{generated_} B>boc <s PUSHSLICE";
Here are unit tests, which shows the problem.
Possible solutions
I see few options, which can help with this situation:
- Make possible to create strings using binary (or other non-utf8 encodings). For example, like in python, using
bliteral:b"binary-data-string\x12\xFF" - Make a different "binary string" structure (for example,
BinaryStringinstead of string). - Create compile-time function
readFile(filename, encoding), which would read specified file contents as a string with specified encoding.
Why it is convenient to use Strings object for binary data?
Tact perfectly encodes/decodes long strings into cells, using standard stringbuilders/toCell methods, which are really useful in case of serializing binary data
the only way to build a "string cell", is to use func with tact wrappers (something like
slice __gen_binary_string() asm "B{generated_} B>boc <s PUSHSLICE";
This is not quite true. You can define a constant cell with arbitrary data from a base64 BOC using compile-time cell("base64 goes here") function.
Maybe, but it is still inconvenient: you have to build the cell yourself, serialize it in the code. I don’t understand, why I have to do this, if I can just use function readFile
@imartemy1524 Usually language designs cannot incorporate all the special cases unless those prove to be very frequent. Once a language feature creeps in we are stuck with it essentially forever (or until the next major release). Regarding your implementation attempt, I especially don't like when the interpreter needs to deal with the file system, when a user can just run a script and present all the data in an already supported format.
@anton-trunov, On one hand I can't disagree with you, that language features need to be discussed and properly reviewed, before "injecting" them into major version. On the other hand, when you say, that
a user can just run a script and present all the data in an already supported format
then what's the purpose of tact, if one still needs to manipulate and create binary cells offside of it by some script (and then inject them into the language)? If we stick to this idea, then it would be better for one to just code in funC, instead of using tact (and inject all needed features within already supported formats :+1:). I thought, that tact was created to optimize this process... Also, isn't support for different encodings, is must-have for modern programming languages?
If you don't like the idea of filesystem communication (if this is the problem), then maybe then one can introduce smthng like b"binary" or bytes("binary")?
@imartemy1524
the thing is that your example of usage (some plain binary data that should be stored in a snake format) is quite too scoped
@imartemy1524 We might support something like the Bytes type, but it has to be designed first: we cannot add new half-baked language features and then later try to retrofit more related things into Tact. Your help on this would be greatly appreciated, though. You could start by developing a small standard library module that actually works with the Bytes type. A special attention would be needed for language feature interaction.