sdk Bug: Candid encoding not correctly assigning different variants for a `key: variant val` vector

for some reason dfx and didc is not working correctly for encoding a vec with the value which is a variant of different types

For simplicity, I stripped the argument down to only the offending structure, and provide an example with didc (didc 0.1.4) which is enough to replicate the issue

didc decode \
   "$(didc encode '(
      vec {
          record { "name"; variant { TextContent = "blah" } };
          record { "ttl"; variant { Nat64Content = 1 : nat64 } };
      }
   )')"

running this will give you an error:

Error: Trailing value after finishing deserialization

Caused by:
    input: 4449444c036d016c02007101026b01ecbfa8c90d71010002046e616d650004626c61680374746c000100_000000000000
    table: type table0 = vec table1
    type table1 = record { text; table2 }
    type table2 = variant { 3_643_416_556 : text }
    wire_type: text, expect_type: text

Which it seems to encode everything as that first type it comes across, instead of corresponding variants, as seen by table2

Meta

dfx --version:

dfx 0.11.2

didc --version:

didc 0.1.4

Oct 12 '22 20:10 ozwaldorf

Thank you for the report with such a nice repro! Dfx contains no code of its own for this operation - it uses didc/the candid crate under the hood. Therefore this is an issue related to candid only. I'll ping the relevant folks.

Oct 13 '22 06:10 viviveevee

The Candid type inference algorithm in didc is incomplete, I think this is a known issue.

I would expect it to work if you pass the expected type using the -t flag to didc --encode.

Alternatively, you can try hcandid, which handles this:

$ hcandid --encode '(
      vec {
          record { "name"; variant { TextContent = "blah" } };
          record { "ttl"; variant { Nat64Content = 1 : nat64 } };
      }
   )' | hcandid --decode --stdin
(vec {record {"name"; variant {fncrpob = "blah"}}; record {"ttl"; variant {abmwxiv = (1 : nat64)}}})

Oct 13 '22 06:10 nomeata

Yep, it's always recommended to provide the types while encoding/decoding a message. We don't have immediate plans to fix this particular issue.

Meanwhile, if you encounter a similar behavior in dfx, please do file a bug. Now that did file is embedded in the canister metadata, dfx should be able to fetch the did file for all canisters.

Oct 13 '22 16:10 chenyan-dfinity

Meanwhile, if you encounter a similar behavior in dfx, please do file a bug. Now that did file is embedded in the canister metadata, dfx should be able to fetch the did file for all canisters.

The issue is found in the latest dfx, linked to a defined canister (but referenced by id), as mentioned in the issue description, where this issue was originally created :disappointed: I referenced a didc example because it's fully self contained to reproduce the error.

Dfx is correctly using the candid provided in the config, even though it's a verbose canister id instead of a name, for other calls as deserialization is working correctly.

@sesi200 @chenyan-dfinity

Oct 14 '22 00:10 ozwaldorf

I think the problem is that dfx is not using the did file when you make canister calls. This can happen if you call the canister by its canister id, not the name defined in dfx.json, or you are calling an external canister.

Can you try dfx canister call <canister_id> <method> --candid <.did file>, and see if it works?

As for the real fix, I think dfx should always try to fetch the did file from metadata before sending message.

Oct 14 '22 01:10 chenyan-dfinity

Can you try dfx canister call <canister_id> <method> --candid <.did file>, and see if it works?

Manually linking the candid is a valid fix

I think the problem is that dfx is not using the did file when you make canister calls. This can happen if you call the canister by its canister id, not the name defined in dfx.json, or you are calling an external canister.

So it's actually weird, for deserialization of the response, dfx will correctly use the candid even if it's referenced by the id directly. It seems like for serialization, it's not linking the candid like deserialize does

As for the real fix, I think dfx should always try to fetch the did file from metadata before sending message.

This is a good idea. But still it's efficient to use the local candid to serialize like mentioned above

Oct 17 '22 02:10 ozwaldorf

Ok, let's call it a bug for now, and after the .did fetch is fixed convert it into a feature request for the metadata fetch. Internal ticket: SDK-783

Oct 17 '22 08:10 viviveevee

So it's actually weird, for deserialization of the response, dfx will correctly use the candid even if it's referenced by the id directly. It seems like for serialization, it's not linking the candid like deserialize does

No. Currently, when calling canister by id, did file is not used. You can confirm this by observing that the decoded record field is always a hash id instead of a string name. The difference is that the current serialization implementation without type is incomplete, it can produce a wrong type and causing errors with deserialization in the future. But we never encourage to use serialization without types, so we have no plans to fix this corner case.

With deserialization, you can always decode the message properly, but with types, you get a nicer field name.

Oct 17 '22 16:10 chenyan-dfinity

So it's actually weird, for deserialization of the response, dfx will correctly use the candid even if it's referenced by the id directly. It seems like for serialization, it's not linking the candid like deserialize does

No. Currently, when calling canister by id, did file is not used. You can confirm this by observing that the decoded record field is always a hash id instead of a string name. The difference is that the current serialization implementation without type is incomplete, it can produce a wrong type and causing errors with deserialization in the future. But we never encourage to use serialization without types, so we have no plans to fix this corner case.

With deserialization, you can always decode the message properly, but with types, you get a nicer field name.

Seems like it does a reverse lookup to the project for the types on responses? Couldn't we just do the same for inputs as well? @chenyan-dfinity

Oct 17 '22 17:10 ozwaldorf