vyper
vyper copied to clipboard
typecheck failure for assignment of `empty(DynArray[..., N]` with mismatched length
Version Information
- vyper Version (output of
vyper --version
):0.3.8+commit.036f153
- OS:
linux
- Python Version (output of
python --version
):3.11.3
What's your issue about?
This behaviour has been introduced in Vyper 0.3.8
. So TL;DR: Vyper 0.3.8
throws if returned type of DynArray
is only a subset wrt to length of allocated return type:
vyper.exceptions.TypeCheckFailure: Bad type for clearing bytes: expected DynArray[String[4], 1368] but got DynArray[String[4], 1]
Example (encode
):
# @dev Sets the maximum input and output length
# allowed. For an n-byte input to be encoded, the
# space required for the Base64-encoded content
# (without line breaks) is "4 * ceil(n/3)" characters.
_DATA_INPUT_BOUND: constant(uint256) = 1024
_DATA_OUTPUT_BOUND: constant(uint256) = 1368
# @dev Defines the Base64 encoding tables. For encoding
# with a URL and filename-safe alphabet, please refer to:
# https://www.rfc-editor.org/rfc/rfc4648#section-5.
_TABLE_STD_CHARS: constant(String[65]) = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
_TABLE_URL_CHARS: constant(String[65]) = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_="
@external
@pure
def encode(data: Bytes[_DATA_INPUT_BOUND], base64_url: bool) -> DynArray[String[4], _DATA_OUTPUT_BOUND]:
"""
@dev Encodes a `Bytes` array using the Base64
binary-to-text encoding scheme.
@notice Due to the Vyper design with fixed-size
string parameters, string concatenations
with itself in a loop can lead to length
mismatches (the underlying issue is that
Vyper does not support a mutable `Bytes`
type). To circumvent this issue, we choose
a dynamic array as the return type.
@param data The maximum 1024-byte data to be
Base64-encoded.
@param base64_url The Boolean variable that specifies
whether to use a URL and filename-safe alphabet
or not.
@return DynArray The maximum 4-character user-readable
string array that combined results in the Base64
encoding of `data`.
"""
data_length: uint256 = len(data)
if (data_length == empty(uint256)):
return empty(DynArray[String[4], 1])
# If the length of the unencoded input is not
# a multiple of three, the encoded output must
# have padding added so that its length is a
# multiple of four.
padding: uint256 = data_length % 3
data_padded: Bytes[_DATA_INPUT_BOUND + 2] = b""
if (padding == 1):
data_padded = concat(data, b"\x00\x00")
elif (padding == 2):
data_padded = concat(data, b"\x00")
else:
data_padded = data
char_chunks: DynArray[String[4], _DATA_OUTPUT_BOUND] = []
idx: uint256 = 0
for _ in range(_DATA_INPUT_BOUND):
# For the Base64 encoding, three bytes (= chunk)
# of the bytestream (= 24 bits) are divided into
# four 6-bit blocks.
chunk: uint256 = convert(slice(data_padded, idx, 3), uint256)
# To write each character, we right shift the 3-byte
# chunk (= 24 bits) four times in blocks of six bits
# for each character (18, 12, 6, 0). Note that masking
# is not required for the first part of the block, as
# 6 bits are already extracted when the chunk is shifted
# to the right by 18 bits (out of 24 bits). To illustrate
# why, here is an example:
# Example case for `c1`:
# 6bit 6bit 6bit 6bit
# │------│------│------│------│
# 011100 000111 100101 110100
#
# `>> 18` (right shift `c1` by 18 bits)
# 6bit 6bit 6bit 6bit
# │------│------│------│------│
# 000000 000000 000000 011100
#
# 63 (or `0x3F`) is `000000000000000000111111` in binary.
# Thus, the bitwise `AND` operation is redundant.
c1: uint256 = shift(chunk, -18)
c2: uint256 = shift(chunk, -12) & 63
c3: uint256 = shift(chunk, -6) & 63
c4: uint256 = chunk & 63
# Base64 encoding with an URL and filename-safe
# alphabet.
if (base64_url):
char_chunks.append(concat(slice(_TABLE_URL_CHARS, c1, 1), slice(_TABLE_URL_CHARS, c2, 1), slice(_TABLE_URL_CHARS, c3, 1),\
slice(_TABLE_URL_CHARS, c4, 1)))
# Base64 encoding using the standard characters.
else:
char_chunks.append(concat(slice(_TABLE_STD_CHARS, c1, 1), slice(_TABLE_STD_CHARS, c2, 1), slice(_TABLE_STD_CHARS, c3, 1),\
slice(_TABLE_STD_CHARS, c4, 1)))
# The following line cannot overflow because we have
# limited the for loop by the `constant` parameter
# `_DATA_INPUT_BOUND`, which is bounded by the
# maximum value of `1024`.
idx = unsafe_add(idx, 3)
# We break the loop once we reach the end of `data`
# (including padding).
if (idx == len(data_padded)):
break
# Case 1: padding of "==" added.
if (padding == 1):
last_chunk: String[2] = slice(char_chunks.pop(), 0, 2)
char_chunks.append(concat(last_chunk, "=="))
# Case 2: padding of "=" added.
elif (padding == 2):
last_chunk: String[3] = slice(char_chunks.pop(), 0, 3)
char_chunks.append(concat(last_chunk, "="))
return char_chunks
The line that throws is return empty(DynArray[String[4], 1])
. Using return []
or return empty(DynArray[String[4], _DATA_OUTPUT_BOUND])
would fix this but am wondering why this behavior was introduced?
How can it be fixed?
Allow returning subset DynArray
using empty
.
This issue was introduced in 3abe588e05be6f6cbfac283739d797ef6c485756.
Arguably, this appears that it should have been invalid syntax and therefore a bug with v0.3.7 rather than v0.3.8 - see this failing test for the bytes equivalent https://github.com/vyperlang/vyper/blob/64733b9d15935ecd2bfcfdfbb9606d5ab500d70c/tests/parser/functions/test_empty.py#L609-L613
However, _check_assign_list
in v0.3.7 did not catch this situation because the condition in this line was not met - in the case of this contract below, left
would be an IRnode
instance of fake_node
instead of a codegen type. Therefore, this was not caught as an exception.
@external
def bar() -> DynArray[uint256, 3]:
return empty(DynArray[uint256, 1])
@tserg thanks for the details. However, I still don't understand why this should be invalid syntax tbh. Since it's a subset wrt the lenght, the compiler should be implicitly be able to convert this. I mean if I do return simply []
the compiler also converts it to the correct return type. So why does empty
have another behaviour?
I think it makes sense to support a direct return in this case since the concern with zeroing an existing value does not apply in a return statement, will wait for @charles-cooper to comment on this.
I think the assertion is there for historical implementation reasons. There used to be a clear()
function which performed what assigning to empty does now. The type was generated internally and so there was a sanity check assertion to make sure the internally generated type matched exactly. Since now we have removed clear and use empty()
which has a user assigned type, we can probably safely remove the internal assertion.
All that being said, I think returning the empty list literal []
is probably stylistically best.
Might be a dup of #3274