ethers.js
ethers.js copied to clipboard
`ethers.utils.defaultAbiCoder.decode` can not decode if a string has certain chars in it
Ethers Version
@ethersproject/abi::5.6.4
Search Terms
abi, decoder
Describe the Problem
Hey, first off ethers is the best thanks for all you do @ricmoo!
Issue
When you try to decode a log that has certain chars in the string like � the decoder fails with an error:
thrown: [null: invalid codepoint at offset 97; unexpected continuation byte (argument="bytes", value=Uint8Array....
This then causes a bailout if you are using ethers to decode anything which may contain this. You can see the successful decoding from tenderly here
here is an example unit test one with normal chars in the data tx here and one with invalid char data.. tx here
Working test
Here we decode the unindexed log info, you can see the tx here PostCreated
event.
import { ethers } from 'ethers';
it('example working', () => {
const unindexedData = [
{ indexed: false, internalType: 'string', name: 'contentURI', type: 'string' },
{ indexed: false, internalType: 'address', name: 'collectModule', type: 'address' },
{ indexed: false, internalType: 'bytes', name: 'collectModuleReturnData', type: 'bytes' },
{ indexed: false, internalType: 'address', name: 'referenceModule', type: 'address' },
{ indexed: false, internalType: 'bytes', name: 'referenceModuleReturnData', type: 'bytes' },
{ indexed: false, internalType: 'uint256', name: 'timestamp', type: 'uint256' },
];
const workingData =
'0x00000000000000000000000000000000000000000000000000000000000000c000000000000000000000000023b9467334beb345aaa6fd1545538f3d54436e960000000000000000000000000000000000000000000000000000000000000140000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001800000000000000000000000000000000000000000000000000000000062e6cbac000000000000000000000000000000000000000000000000000000000000005068747470733a2f2f646174612e6c656e732e7068617665722e636f6d2f6170692f6c656e732f706f7374732f65323936633839662d353364652d346332632d623237372d37306163653361643632336100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000';
const result = ethers.utils.defaultAbiCoder.decode(
unindexedData.map((i) => i.type),
workingData
);
expect(result).toEqual([
'https://data.lens.phaver.com/api/lens/posts/e296c89f-53de-4c2c-b277-70ace3ad623a',
'0x23b9467334bEb345aAa6fd1545538F3d54436e96',
'0x0000000000000000000000000000000000000000000000000000000000000001',
'0x0000000000000000000000000000000000000000',
'0x',
{ _hex: '0x62e6cbac', _isBigNumber: true },
]);
});
Broken test
Here we decode the unindexed log info, you can see the tx here PostCreated
event.
it('example not working', () => {
const unindexedData = [
{ indexed: false, internalType: 'string', name: 'contentURI', type: 'string' },
{ indexed: false, internalType: 'address', name: 'collectModule', type: 'address' },
{ indexed: false, internalType: 'bytes', name: 'collectModuleReturnData', type: 'bytes' },
{ indexed: false, internalType: 'address', name: 'referenceModule', type: 'address' },
{ indexed: false, internalType: 'bytes', name: 'referenceModuleReturnData', type: 'bytes' },
{ indexed: false, internalType: 'uint256', name: 'timestamp', type: 'uint256' },
];
const badData =
'0x00000000000000000000000000000000000000000000000000000000000000c000000000000000000000000023b9467334beb345aaa6fd1545538f3d54436e960000000000000000000000000000000000000000000000000000000000000220000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002600000000000000000000000000000000000000000000000000000000062e6cbbe0000000000000000000000000000000000000000000000000000000000000131646174613a2c7b2276657273696f6e223a22312e302e30222c226d657461646174615f6964223a2235623433383734632d393831392d343637652d396638652d386133326631653430356663222c226465736372697074696f6e223a22676d2028bf8cf09f2c20bf8cf09f29222c22636f6e74656e74223a22676d2028bf8cf09f2c20bf8cf09f29222c2265787465726e616c5f75726c223a6e756c6c2c22696d616765223a6e756c6c2c22696d6167654d696d6554797065223a6e756c6c2c226e616d65223a22506f73742062792040646f6e6f736f6e61756d637a756b222c2261747472696275746573223a5b7b22747261697454797065223a2274797065222c2276616c7565223a22706f7374227d5d2c226d65646961223a5b5d2c226170704964223a224c656e73746572227d000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000';
const result = ethers.utils.defaultAbiCoder.decode(
unindexedData.map((i) => i.type),
badData
);
expect(result).toEqual([
'"data:,{"version":"1.0.0","metadata_id":"5b43874c-9819-467e-9f8e-8a32f1e405fc","description":"gm (����, ����)","content":"gm (����, ����)","external_url":null,"image":null,"imageMimeType":null,"name":"Post by @donosonaumczuk","attributes":[{"traitType":"type","value":"post"}],"media":[],"appId":"Lenster"}"',
'0x23b9467334bEb345aAa6fd1545538F3d54436e96',
'0x0000000000000000000000000000000000000000000000000000000000000001',
'0x0000000000000000000000000000000000000000',
'0x',
{ _hex: '0x62e6cbbe', _isBigNumber: true },
]);
});
you see this throws the null: invalid codepoint at offset 97;
error!
Would be great to understand why this happens and also how tenderlys and etherscans decoder seems to manage to work it out.. are they doing something differently to how we do it here?
Bailing out on cases like this did cause us to have to write some bespoke code to handle the events which emit a string that could include these chars.. so we catch when ethers decoder fails and do our bespoke logic to make sure our indexer can carry on. Even if it can not decode it having a value assigned to the array index that it failed on would be nice so you can still access the other decoded information.
Let me know if you need any more info.
Thanks
Errors
`thrown: [null: invalid codepoint at offset 97; unexpected continuation byte (argument="bytes", value=Uint8Array....`
Environment
node.js (v12 or newer)
any idea @ricmoo
That error means the data is not valid UTF8 data.
You can use the recoverable error API to access it with a different strategy (such as ignore or replace), but I’m not at a computer to type in demo code right now; you basically can get the bytes from the error and use the toUtf8String
function, passing in the strategy callback for errors.
Keep in mind when processing invalid UTF8 data, changing things using non-error strategies can result in exploits. It allows multiple different strings to have the same hash, for example.
would love to get some code in what you mean here... do you think that's how etherscan + tenderly still manage to decode the log?
this came back up with some tech debt is there an elegant way to do this without losing the rest of the valid data you can decode? i tried a few things nothing worked everytime wondered if you guys have solved it
Ethers fully supports decoding data as long as the structure is correct and can be parsed.
In the case of an invalid string, only accessing that valid within the result will throw using the "deferred error API".
Here is an example of how to use alternate string decoding mechanisms if you data contains bad strings, but please keep in mind that care should be taken when using invalid strings as they can be used for a variety of attacks:
data = '0x00000000000000000000000000000000000000000000000000000000000000220000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000000b48656c6c6f20576f6c72ff000000000000000000000000000000000000000000';
result = e.AbiCoder.defaultAbiCoder().decode([ "uint", "string" ], data)
// This is fine, since the uint at index 0 is perfectly fine
console.log(result[0])
// 34n
// This will throw however, since the string is invalid and you are *accessing* it
console.log(result[1])
// throws
// Instead, to access the invalid data, perhaps to attempt decoding using an alternate error strategy, capture the error:
try {
result[1];
} catch (e) {
// The offending bytes, extracted from the error
const badBytes = e.error.value;
// Using the ignore strategy, invalid UTF-8 code points are discarded:
console.log(toUtf8String(badBytes, ethers.Utf8ErrorFuncs.ignore));
// "Hello Wolr"
// Using the replacement strategy, invalid UTF-8 code points are replaced with the UTF-8 replacement character
// "Hello Wolr�'
}
The danger of these strategies is that byte errors are folding, so you can have the following problem:
bytesA = "0x48656c6c6f20576f6c72ff"
bytesB = "0x48656c6c6f20576f6c72fe"
console.log(bytesA == bytesB)
// false
strA = toUtf8String(bytesA, ethers.Utf8ErrorFuncs.ignore);
strB = toUtf8String(bytesB, ethers.Utf8ErrorFuncs.ignore);
console.log(a == b)
// true
Does that make sense?