vyper icon indicating copy to clipboard operation
vyper copied to clipboard

typecheck failure for assignment of `empty(DynArray[..., N]` with mismatched length

Open pcaversaccio opened this issue 1 year ago • 5 comments

Version Information

  • vyper Version (output of vyper --version): 0.3.8+commit.036f153
  • OS: linux
  • Python Version (output of python --version): 3.11.3

What's your issue about?

This behaviour has been introduced in Vyper 0.3.8. So TL;DR: Vyper 0.3.8 throws if returned type of DynArray is only a subset wrt to length of allocated return type:

vyper.exceptions.TypeCheckFailure: Bad type for clearing bytes: expected DynArray[String[4], 1368] but got DynArray[String[4], 1]

Example (encode):

# @dev Sets the maximum input and output length
# allowed. For an n-byte input to be encoded, the
# space required for the Base64-encoded content
# (without line breaks) is "4 * ceil(n/3)" characters.
_DATA_INPUT_BOUND: constant(uint256) = 1024
_DATA_OUTPUT_BOUND: constant(uint256) = 1368


# @dev Defines the Base64 encoding tables. For encoding
# with a URL and filename-safe alphabet, please refer to:
# https://www.rfc-editor.org/rfc/rfc4648#section-5.
_TABLE_STD_CHARS: constant(String[65]) = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
_TABLE_URL_CHARS: constant(String[65]) = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_="


@external
@pure
def encode(data: Bytes[_DATA_INPUT_BOUND], base64_url: bool) -> DynArray[String[4], _DATA_OUTPUT_BOUND]:
    """
    @dev Encodes a `Bytes` array using the Base64
         binary-to-text encoding scheme.
    @notice Due to the Vyper design with fixed-size
            string parameters, string concatenations
            with itself in a loop can lead to length
            mismatches (the underlying issue is that
            Vyper does not support a mutable `Bytes`
            type). To circumvent this issue, we choose
            a dynamic array as the return type.
    @param data The maximum 1024-byte data to be
           Base64-encoded.
    @param base64_url The Boolean variable that specifies
           whether to use a URL and filename-safe alphabet
           or not.
    @return DynArray The maximum 4-character user-readable
            string array that combined results in the Base64
            encoding of `data`.
    """
    data_length: uint256 = len(data)
    if (data_length == empty(uint256)):
        return empty(DynArray[String[4], 1])

    # If the length of the unencoded input is not
    # a multiple of three, the encoded output must
    # have padding added so that its length is a
    # multiple of four.
    padding: uint256 = data_length % 3
    data_padded: Bytes[_DATA_INPUT_BOUND + 2] = b""
    if (padding == 1):
        data_padded = concat(data, b"\x00\x00")
    elif (padding == 2):
        data_padded = concat(data, b"\x00")
    else:
        data_padded = data

    char_chunks: DynArray[String[4], _DATA_OUTPUT_BOUND] = []
    idx: uint256 = 0
    for _ in range(_DATA_INPUT_BOUND):
        # For the Base64 encoding, three bytes (= chunk)
        # of the bytestream (= 24 bits) are divided into
        # four 6-bit blocks.
        chunk: uint256 = convert(slice(data_padded, idx, 3), uint256)

        # To write each character, we right shift the 3-byte
        # chunk (= 24 bits) four times in blocks of six bits
        # for each character (18, 12, 6, 0). Note that masking
        # is not required for the first part of the block, as
        # 6 bits are already extracted when the chunk is shifted
        # to the right by 18 bits (out of 24 bits). To illustrate
        # why, here is an example:
        # Example case for `c1`:
        #   6bit   6bit   6bit   6bit
        # │------│------│------│------│
        #  011100 000111 100101 110100
        #
        # `>> 18` (right shift `c1` by 18 bits)
        #   6bit   6bit   6bit   6bit
        # │------│------│------│------│
        #  000000 000000 000000 011100
        #
        # 63 (or `0x3F`) is `000000000000000000111111` in binary.
        # Thus, the bitwise `AND` operation is redundant.
        c1: uint256 = shift(chunk, -18)
        c2: uint256 = shift(chunk, -12) & 63
        c3: uint256 = shift(chunk, -6) & 63
        c4: uint256 = chunk & 63

        # Base64 encoding with an URL and filename-safe
        # alphabet.
        if (base64_url):
            char_chunks.append(concat(slice(_TABLE_URL_CHARS, c1, 1), slice(_TABLE_URL_CHARS, c2, 1), slice(_TABLE_URL_CHARS, c3, 1),\
                                      slice(_TABLE_URL_CHARS, c4, 1)))
        # Base64 encoding using the standard characters.
        else:
            char_chunks.append(concat(slice(_TABLE_STD_CHARS, c1, 1), slice(_TABLE_STD_CHARS, c2, 1), slice(_TABLE_STD_CHARS, c3, 1),\
                                      slice(_TABLE_STD_CHARS, c4, 1)))

        # The following line cannot overflow because we have
        # limited the for loop by the `constant` parameter
        # `_DATA_INPUT_BOUND`, which is bounded by the
        # maximum value of `1024`.
        idx = unsafe_add(idx, 3)

        # We break the loop once we reach the end of `data`
        # (including padding).
        if (idx == len(data_padded)):
            break

    # Case 1: padding of "==" added.
    if (padding == 1):
        last_chunk: String[2] = slice(char_chunks.pop(), 0, 2)
        char_chunks.append(concat(last_chunk, "=="))
    # Case 2: padding of "=" added.
    elif (padding == 2):
        last_chunk: String[3] = slice(char_chunks.pop(), 0, 3)
        char_chunks.append(concat(last_chunk, "="))

    return char_chunks

The line that throws is return empty(DynArray[String[4], 1]). Using return [] or return empty(DynArray[String[4], _DATA_OUTPUT_BOUND]) would fix this but am wondering why this behavior was introduced?

How can it be fixed?

Allow returning subset DynArray using empty.

pcaversaccio avatar May 26 '23 14:05 pcaversaccio

This issue was introduced in 3abe588e05be6f6cbfac283739d797ef6c485756.

Arguably, this appears that it should have been invalid syntax and therefore a bug with v0.3.7 rather than v0.3.8 - see this failing test for the bytes equivalent https://github.com/vyperlang/vyper/blob/64733b9d15935ecd2bfcfdfbb9606d5ab500d70c/tests/parser/functions/test_empty.py#L609-L613

However, _check_assign_list in v0.3.7 did not catch this situation because the condition in this line was not met - in the case of this contract below, left would be an IRnode instance of fake_node instead of a codegen type. Therefore, this was not caught as an exception.

@external
def bar() -> DynArray[uint256, 3]:
    return empty(DynArray[uint256, 1])

tserg avatar May 27 '23 10:05 tserg

@tserg thanks for the details. However, I still don't understand why this should be invalid syntax tbh. Since it's a subset wrt the lenght, the compiler should be implicitly be able to convert this. I mean if I do return simply [] the compiler also converts it to the correct return type. So why does empty have another behaviour?

pcaversaccio avatar May 27 '23 10:05 pcaversaccio

I think it makes sense to support a direct return in this case since the concern with zeroing an existing value does not apply in a return statement, will wait for @charles-cooper to comment on this.

tserg avatar May 27 '23 11:05 tserg

I think the assertion is there for historical implementation reasons. There used to be a clear() function which performed what assigning to empty does now. The type was generated internally and so there was a sanity check assertion to make sure the internally generated type matched exactly. Since now we have removed clear and use empty() which has a user assigned type, we can probably safely remove the internal assertion.

All that being said, I think returning the empty list literal [] is probably stylistically best.

charles-cooper avatar May 28 '23 15:05 charles-cooper

Might be a dup of #3274

trocher avatar Jun 02 '23 09:06 trocher