cadence icon indicating copy to clipboard operation
cadence copied to clipboard

Remove access to fields by index

Open fxamacker opened this issue 2 years ago • 19 comments

Issue to be solved

Accessing fields by index (instead of by name) can create dependency on sort order.

For example, issue https://github.com/onflow/cadence/issues/2558:

Some client programs are accessing fields by index rather than name. This created a dependency on the sort order of fields which surfaced when we switched from JSON-CDC to CCF for encoding events because CCF explicitly sorts as specified in CCF specification for deterministic encodings.

PR https://github.com/onflow/cadence/pull/2559 resolved issue 2558 by updating CCF Codec to make CCF Determinisitic Encoding optional. It was a stop-gap solution to avoid abruptly breaking compatibility for encoded events.

For exported values, not allowing access to fields by index can help prevent Cadence developers from unintentionally creating dependency on sort order. This can prevent unintended consequences like blocking deterministic encoding, etc.

However, this issue can be viewed as broader topic than CCF, sort order, or access by field index.

Current Status

CCF encoding is not currently sorting the fields (in order to be compatible with old programs that access fields by index).

  • CCF specs (onflow/ccf) defines how to sort and makes it optional, so each CCF-based format can balance trade-offs by specifying which CCF options they use:

    CCF defines "Deterministic CCF Encoding Requirements" and makes it optional.

  • CCF codec (onflow/cadence) offers sorting as non-default option, but sorting isn't being used (e.g. for events encoding) due to compatibility reasons mentioned in:

    • #2558
    • #2559

I didn't find any programs using the sort option in CCF codec (as of Nov 28, 2023).

Suggested Solution

Programming languages before 1.0 can benefit from taking away the ability for programmers to (sometimes unintentionally) create dependencies on implementation details of compilers and interpreters.

For example, a few months before Go 1.0 (~12 years ago), https://codereview.appspot.com/5285042/ added a random offset for map iteration and changed Go language specification.

Index: doc/go_spec.html
===================================================================
--- a/doc/go_spec.html
+++ b/doc/go_spec.html
@@ -4085,7 +4085,8 @@
 </li>
 
 <li>
-The iteration order over maps is not specified.
+The iteration order over maps is not specified
+and is not guaranteed to be the same from one iteration to the next.
 If map entries that have not yet been reached are deleted during iteration,
 the corresponding iteration values will not be produced. If map entries are
 inserted during iteration, the behavior is implementation-dependent, but the

That change to Go intentionally broke some existing Go programs by taking away existing behavior that wouldn't be ideal to officially or unofficially support forever after Go 1.0 release.

In the same way, maybe we can determine if some existing features (accessing field by index) should be removed before Cadence 1.0 release.

EDIT: added Current Status section about CCF for more context.

fxamacker avatar Nov 21 '23 23:11 fxamacker

Problem:

  • cadence package's composite value Go types, like Struct, Resource, etc have a Fields field, an array of the fields' values
  • The order previously matched the order of how the fields are declared in the code
  • CCF encoding now sorts the fields
  • Developers assumed the order in the Fields array matches the order of how the fields are declared in the code, so they index directly into it using constants
  • However, this will break once field values are sorted

Considered:

  • Restoring the ordering fixes the issue, but leaves the brittle code: Future changes to type in deployed contract might break server-side decoding based on hard-coded ordering
  • Code should have performed a lookup by name in composite type' fields StructType's Fields array of name/type pairs

Proposed fix:

  • Unexport Fields in composite value types (Struct, Resource, etc.)
  • Introduce field value lookup function FieldByName(name string) Value for composite value types
  • The helps developers write less brittle code (removes assumption about order of fields in deployed code)

turbolent avatar Nov 27 '23 20:11 turbolent

Doesn't this lose information about ordering? Let's say it is an event signature from past.

bluesign avatar Nov 29 '23 23:11 bluesign

@fxamacker I assume both values and type information will be re-ordered?

turbolent avatar Dec 06 '23 00:12 turbolent

@fxamacker I assume both values and type information will be re-ordered?

@turbolent Yes. Both would be re-ordered, and maybe both should be unexported.

fxamacker avatar Dec 06 '23 19:12 fxamacker

@bluesign when would the order be required to know? Like I mentioned in my last comment, code that relies on the order is brittle anyways and should probably be updated

turbolent avatar Dec 06 '23 22:12 turbolent

@bluesign when would the order be required to know? Like I mentioned in my last comment, code that relies on the order is brittle anyways and should probably be updated

Yeah I agree on that part, but we had similar problem with event encoding order before. I think latest victim was @bjartek he can chime in.

But main problem is; if we make old code ( pre 1.0 ) compatible with new deterministic ccf, then there is a risk of breaking silently. If it breaks with new encoding with some error etc, it is ok I guess.

PS: I still have a feeling order of things carry a meaning ( in event signatures for example ) maybe I am thinking too much objective-c like in my head.

bluesign avatar Dec 06 '23 23:12 bluesign

Pre candidate7 the order of event field names was done alphabetically and not in definition order. Luckily it obly affected topshot and i made a manual fix

bjartek avatar Dec 07 '23 10:12 bjartek

Not sure if that last comment is relevant to this thread though. Ss long as the name of fields and content of fields are ordered correctly so they match I have no issue with it.

bjartek avatar Dec 07 '23 10:12 bjartek

I should have been more clear in my comment, I think order still has importance but ignoring that my main issue here is, if we have data in 2 versions ( non-sorted D0 and sorted D1 ) and functions to access ( F0 and F1 )

  • F1(D0) and F1(D1) will be ok, F0(D0) will be still ok ( as it is currently ), but the problem is F0(D1) will fail silently. ( that's why I gave example of old events in candidate7, they fail at some time silently too )

I think this is important thing to consider as Cadence is not backwards compatible, while accessing old data we need matching Cadence version.

bluesign avatar Dec 07 '23 10:12 bluesign

But main problem is; if we make old code ( pre 1.0 ) compatible with new deterministic ccf, then there is a risk of breaking silently. If it breaks with new encoding with some error etc, it is ok I guess.

The plan isn't to make pre-1.0 compatible with new behaviour. Just like before, when interacting with the chain or its data, the appropriate Cadence version must be used (I know this isn't ideal, just pointing out the fact).

turbolent avatar Jan 12 '24 00:01 turbolent

Doesn't this lose information about ordering? [...]

PS: I still have a feeling order of things carry a meaning ( in event signatures for example ) [...]

Events are just composites, so their field order is insignificant. We're probably not pointing that out sufficiently in documentation.

What probably adds to the assumption that event fields have some sort of order is that the syntax for event declarations is different from other composite declarations, and thus kind of suggests an order like in a parameter list.

Hyrum's law strikes again. This post from a couple days ago is very relevant: https://twitter.com/jimmykoppel/status/1744544779138978006

turbolent avatar Jan 12 '24 00:01 turbolent

I think "events are composites" is a bit unfair statement especially with the last link you gave (which focuses on observability)

"Events are composites" is not observable to the user of Cadence, basically it is an implementation detail. Alternatively they could be implemented as non cadence type even. You cannot instantiate in normal cases without emit, or save them etc.

As I pointed before, from outside perspective, they are function signatures ( constructor more likely ) so order of things matter ( as we don't have named arguments )

This is not so important ( in this case ) of course.

bluesign avatar Jan 12 '24 00:01 bluesign

@fxamacker is this purely related to CCF or also an problem with the introduction of the atree register inlining?

turbolent avatar Apr 23 '24 00:04 turbolent

@fxamacker Can you please verify that #3290 is sufficient? Or do we also need to unexport field types?

turbolent avatar Apr 30 '24 21:04 turbolent

@fxamacker Can you please verify that https://github.com/onflow/cadence/pull/3290 is sufficient? Or do we also need to unexport field types?

@turbolent I can probably take a look tomorrow at CCF after looking more closely into migration issue https://github.com/onflow/cadence/issues/3288 while it is still fresh.

BTW, I'm not sure if everyone already knows about CCF codec support for different "modes" at runtime. A single program can support different CCF-based formats, versions, etc. by using modes. A codec's mode is a set of encoding or decoding options made immutable during startup for concurrent use later.

// Programs can create and choose different CCF encoding modes (and decoding modes) at runtime.
// CCF modes are safe for concurrent use (e.g. create modes at startup and reuse from multiple goroutines).

if fooVersion == 0 {
  // Keep original sort order of fields provided by Cadence.
  b, err := ccfFoofUnsortedMode.Marshal(v)  
} else if foolVersion == 1 {
  // Sorts fields as specified in CCF Deterministic Encoding Requirements.
  b, err := ccfFooSortedMode.Marshal(v)  
} // else if etc.

CCF codec modes work like CBOR codec modes described at fxamacker/cbor Quick Start .

fxamacker avatar May 01 '24 02:05 fxamacker

@fxamacker Can you please verify that https://github.com/onflow/cadence/pull/3290 is sufficient? Or do we also need to unexport field types?

@turbolent I think we should also unexport field types for consistency with field values.

We can unexport field type and provide a new function FieldTypes() that returns map[string]Type. This way, there isn't any implied ordering for either field definitions or field values. It also gives us flexibility to change implementation details without breaking API.

fxamacker avatar May 08 '24 00:05 fxamacker

@fxamacker Sounds good, I was thinking the same. I'll work on that tomorrow

turbolent avatar May 08 '24 00:05 turbolent

@turbolent BTW, in PR #3290 it seems FieldsMappedByName() includes attachment as a field if present, while SearchFieldByName() only looks up field type definition. Is it OK for these two functions to differ in this way?

Sorry about reposting this comment :pray: I accidentally deleted it from merged PR 3290 and moved it here.

fxamacker avatar May 08 '24 00:05 fxamacker

@fxamacker Sounds good! I had started trying to change this, but it looks like the CCF codec relies on being able to set the type slice. Maybe we can have a short sync (not urgent, sometime next or the following week), to see how we could refactor this?

turbolent avatar May 10 '24 18:05 turbolent