vyper
vyper copied to clipboard
typecheck failure for assignment of `empty(DynArray[..., N]` with mismatched length
Version Information
- vyper Version (output of
vyper --version):0.3.8+commit.036f153 - OS:
linux - Python Version (output of
python --version):3.11.3
What's your issue about?
This behaviour has been introduced in Vyper 0.3.8. So TL;DR: Vyper 0.3.8 throws if returned type of DynArray is only a subset wrt to length of allocated return type:
vyper.exceptions.TypeCheckFailure: Bad type for clearing bytes: expected DynArray[String[4], 1368] but got DynArray[String[4], 1]
Example (encode):
# @dev Sets the maximum input and output length
# allowed. For an n-byte input to be encoded, the
# space required for the Base64-encoded content
# (without line breaks) is "4 * ceil(n/3)" characters.
_DATA_INPUT_BOUND: constant(uint256) = 1024
_DATA_OUTPUT_BOUND: constant(uint256) = 1368
# @dev Defines the Base64 encoding tables. For encoding
# with a URL and filename-safe alphabet, please refer to:
# https://www.rfc-editor.org/rfc/rfc4648#section-5.
_TABLE_STD_CHARS: constant(String[65]) = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
_TABLE_URL_CHARS: constant(String[65]) = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_="
@external
@pure
def encode(data: Bytes[_DATA_INPUT_BOUND], base64_url: bool) -> DynArray[String[4], _DATA_OUTPUT_BOUND]:
"""
@dev Encodes a `Bytes` array using the Base64
binary-to-text encoding scheme.
@notice Due to the Vyper design with fixed-size
string parameters, string concatenations
with itself in a loop can lead to length
mismatches (the underlying issue is that
Vyper does not support a mutable `Bytes`
type). To circumvent this issue, we choose
a dynamic array as the return type.
@param data The maximum 1024-byte data to be
Base64-encoded.
@param base64_url The Boolean variable that specifies
whether to use a URL and filename-safe alphabet
or not.
@return DynArray The maximum 4-character user-readable
string array that combined results in the Base64
encoding of `data`.
"""
data_length: uint256 = len(data)
if (data_length == empty(uint256)):
return empty(DynArray[String[4], 1])
# If the length of the unencoded input is not
# a multiple of three, the encoded output must
# have padding added so that its length is a
# multiple of four.
padding: uint256 = data_length % 3
data_padded: Bytes[_DATA_INPUT_BOUND + 2] = b""
if (padding == 1):
data_padded = concat(data, b"\x00\x00")
elif (padding == 2):
data_padded = concat(data, b"\x00")
else:
data_padded = data
char_chunks: DynArray[String[4], _DATA_OUTPUT_BOUND] = []
idx: uint256 = 0
for _ in range(_DATA_INPUT_BOUND):
# For the Base64 encoding, three bytes (= chunk)
# of the bytestream (= 24 bits) are divided into
# four 6-bit blocks.
chunk: uint256 = convert(slice(data_padded, idx, 3), uint256)
# To write each character, we right shift the 3-byte
# chunk (= 24 bits) four times in blocks of six bits
# for each character (18, 12, 6, 0). Note that masking
# is not required for the first part of the block, as
# 6 bits are already extracted when the chunk is shifted
# to the right by 18 bits (out of 24 bits). To illustrate
# why, here is an example:
# Example case for `c1`:
# 6bit 6bit 6bit 6bit
# │------│------│------│------│
# 011100 000111 100101 110100
#
# `>> 18` (right shift `c1` by 18 bits)
# 6bit 6bit 6bit 6bit
# │------│------│------│------│
# 000000 000000 000000 011100
#
# 63 (or `0x3F`) is `000000000000000000111111` in binary.
# Thus, the bitwise `AND` operation is redundant.
c1: uint256 = shift(chunk, -18)
c2: uint256 = shift(chunk, -12) & 63
c3: uint256 = shift(chunk, -6) & 63
c4: uint256 = chunk & 63
# Base64 encoding with an URL and filename-safe
# alphabet.
if (base64_url):
char_chunks.append(concat(slice(_TABLE_URL_CHARS, c1, 1), slice(_TABLE_URL_CHARS, c2, 1), slice(_TABLE_URL_CHARS, c3, 1),\
slice(_TABLE_URL_CHARS, c4, 1)))
# Base64 encoding using the standard characters.
else:
char_chunks.append(concat(slice(_TABLE_STD_CHARS, c1, 1), slice(_TABLE_STD_CHARS, c2, 1), slice(_TABLE_STD_CHARS, c3, 1),\
slice(_TABLE_STD_CHARS, c4, 1)))
# The following line cannot overflow because we have
# limited the for loop by the `constant` parameter
# `_DATA_INPUT_BOUND`, which is bounded by the
# maximum value of `1024`.
idx = unsafe_add(idx, 3)
# We break the loop once we reach the end of `data`
# (including padding).
if (idx == len(data_padded)):
break
# Case 1: padding of "==" added.
if (padding == 1):
last_chunk: String[2] = slice(char_chunks.pop(), 0, 2)
char_chunks.append(concat(last_chunk, "=="))
# Case 2: padding of "=" added.
elif (padding == 2):
last_chunk: String[3] = slice(char_chunks.pop(), 0, 3)
char_chunks.append(concat(last_chunk, "="))
return char_chunks
The line that throws is return empty(DynArray[String[4], 1]). Using return [] or return empty(DynArray[String[4], _DATA_OUTPUT_BOUND]) would fix this but am wondering why this behavior was introduced?
How can it be fixed?
Allow returning subset DynArray using empty.
This issue was introduced in 3abe588e05be6f6cbfac283739d797ef6c485756.
Arguably, this appears that it should have been invalid syntax and therefore a bug with v0.3.7 rather than v0.3.8 - see this failing test for the bytes equivalent https://github.com/vyperlang/vyper/blob/64733b9d15935ecd2bfcfdfbb9606d5ab500d70c/tests/parser/functions/test_empty.py#L609-L613
However, _check_assign_list in v0.3.7 did not catch this situation because the condition in this line was not met - in the case of this contract below, left would be an IRnode instance of fake_node instead of a codegen type. Therefore, this was not caught as an exception.
@external
def bar() -> DynArray[uint256, 3]:
return empty(DynArray[uint256, 1])
@tserg thanks for the details. However, I still don't understand why this should be invalid syntax tbh. Since it's a subset wrt the lenght, the compiler should be implicitly be able to convert this. I mean if I do return simply [] the compiler also converts it to the correct return type. So why does empty have another behaviour?
I think it makes sense to support a direct return in this case since the concern with zeroing an existing value does not apply in a return statement, will wait for @charles-cooper to comment on this.
I think the assertion is there for historical implementation reasons. There used to be a clear() function which performed what assigning to empty does now. The type was generated internally and so there was a sanity check assertion to make sure the internally generated type matched exactly. Since now we have removed clear and use empty() which has a user assigned type, we can probably safely remove the internal assertion.
All that being said, I think returning the empty list literal [] is probably stylistically best.
Might be a dup of #3274