fabric icon indicating copy to clipboard operation
fabric copied to clipboard

Chaincode shim `GetState` routine should return an indication of whether the key was found in the database

Open bcbrock opened this issue 8 years ago • 48 comments

If a chaincode calls stub.GetState with a key that does not exist, no error or presence indication is returned. This can lead to crashes if the chaincode was expecting the key to be present and non-empty. Since the empty byte array is valid data this can not be used as a check for the presence of of key. It seems that the "Go way" to handle this would be to include a 3rd, 'ok' return value separate from the error return that would be set true if and only if the key were actually found in the database.

bcbrock avatar Apr 01 '16 22:04 bcbrock

@srderson Can you add help wanted to this?

ibmmark avatar Apr 26 '16 18:04 ibmmark

I've encountered this as a challenge in writing chaincode.

For most applications protocols, I've been experimenting treating an array of len(0) as the equivalent to "key not found" has been worked fine.

The API is ambiguous on first encounter and I believe it will lead to errors.

I like the @bcbrock suggestion the third return value.

zmanian avatar Apr 26 '16 18:04 zmanian

@srderson - I'd like to volunteer to address this issue. I need to learn how the state database works for my research.

bcbrock avatar May 02 '16 02:05 bcbrock

@bcbrock - are you on this one?

mastersingh24 avatar May 06 '16 12:05 mastersingh24

Yes

bcbrock avatar May 06 '16 12:05 bcbrock

I can help. Can you assign the issue to me

murali-katipalli-dtcc avatar May 06 '16 15:05 murali-katipalli-dtcc

I can't assign it, but I understand you are going to do it.

bcbrock avatar May 06 '16 15:05 bcbrock

Would returning the third value be the appropriate way? If we do return the third value, wouldn't it break existing GetState API. And returning multiple error return types (error and ok) doesnt seem to be the 'Go' way of doing it. Instead can we do the following - When key is not found, shim returns as an Error. Client can check on that error and take appropriate action. This way we can maintain our existing API and provide the requisite functionality. If we choose this way, should we define GetStateError type? or should we keep it simple where it just returns the generic Error with appropriate message.

Pl. advice and I can implement it accordingly

Murali (DTCC)

murali-katipalli-dtcc avatar May 06 '16 18:05 murali-katipalli-dtcc

@murali-katipalli-dtcc I'm personally not worried about changing the API. This is a very new project that has not published a single release. The compiler will flag the error, people will update their chaincodes and move on. In fact the 3-return-value way will ensure that people really understand and code for the new semantics because the compiler won't allow them to continue until they have acknowledged the change in their code.

I'm not a Go programmer, so I can't say too much about the "Go way". However I think there is a good argument that not finding a datum is not an error. It is a normal course of events. Some applications may even be happy to accept the empty array as the default value and work properly. With the error-type suggestion these programs would have to check to make sure that the Error was not the non-initialized error that they didn't care about.

So I still say add another return value - and document and guarantee what value is returned if the key is not found.

bcbrock avatar May 06 '16 20:05 bcbrock

Regardless of approach, fix depends upon ledger.GetState on the peer side letting us know if key exists or not. This may need to percolate one level down the stack.

ledger.GetState returns ([]byte, error). Would "nil" bytes imply key-not-found ? That may not be true if ledger.SetState("key", nil) is allowed (ie, nil is a valid valid value).

@manish-sethi, @srderson comment please ?

muralisrini avatar May 08 '16 15:05 muralisrini

@muralisrini - If we write 'nil' (or a zero-length byte array) to rocksdb, it's hard to differentiate during read time (unless, we do some special replacement for nil values - which I guess we should refrain as of now).

I think that it would be better not to allow 'nil' and a zero-length byte array.

  • During write, we can return error if we receive either of these values.
  • During read, a 'nil' would implicitly mean 'key not found' ('ok' kind of third argument can also be implemented but that only make sense if a 'nil' is allowed, as you said)

manish-sethi avatar May 09 '16 13:05 manish-sethi

@manish-sethi I'm ok with that. Good to do an explicit check and prevent illegal PutState if the sematics of GetState are dependent on it. Can you open an issue for that please ?

muralisrini avatar May 09 '16 14:05 muralisrini

Hang on. There should be no restriction on what can be written to the data base. If nil and/or the empty array are legal values that can be written to the database, then they must be allowed to be written to the database and they must be handled correctly.

bcbrock avatar May 09 '16 15:05 bcbrock

I do think a 3 value return (value, ok, err) would be much better, there's just the pain of breaking all existing chancodes. Should we consider making a state stuct and have Get Put, etc. functions on that struct? It may make the godoc more readable and we could deprecate the old functions.

In terms of whether we allow nil keys and values, I could go either way. I know Java allows this. Not sure about Go. I don't see why we have to conform to the rules of RocksDB. In the future, we may use a different DB.

srderson avatar May 09 '16 16:05 srderson

I would agree that it probably never makes sense to put nil (or any other pointer, as-is) into a database. But the empty array or empty string are legitimately useful values I think.

Where would this new state object come from? Could there be more then one of them? I'm not disagreeing, just asking.

If we wanted to add more state APIs, either as functions or methods, other programing languages and systems include "Get" APIs that allow the call to include an explicit default value that is returned in the event the data is not found, and APIs that allow a simple existence query on one or more keys.

bcbrock avatar May 09 '16 16:05 bcbrock

I did some investigation on this. Like rocksdb, protobuf also does not distinguishes between a nil and an empty bytes. It behaves even strangely. E.g., (a) A serialization-deserialization cycle using proto.Buffer methods results a 'nil' into a []byte{} (b) If you use proto.Marshal method for marshalling a struct this converts an empty array field (in the struct ) to a nil value. Still would have to see in our code path where all this would impact us.

But, in a summary differentiating between a nil and an empty array would be cumbersome. At best, we can try to support either of the following (c) Treat both an empty array and a nil interchangeably (d) Allow only an empty array and throw exception on an attempt to store a nil.

Further, for user on the shim side we have two option

  • Return a third argument from the ledger in read calls (OK, boolean) which will change the APIs
  • Provide another API for checking whether a given key exists

I can make changes in the ledger and state code for these because, they are spread a many places as I found during my initial investigation. Please let me know if this sounds reasonable?

manish-sethi avatar May 10 '16 16:05 manish-sethi

When you say "throw an exception" in d) I assume you also mean that the chaincode would get an error return if it tried to store nil? If so that sounds reasonable to me, it doesn't make sense to store pointers.

I still believe that the most urgent requirement is for an API with a 3rd argument, whether direct or via a new object as Sheehan suggested. This is more efficient than a "key exists" call (a network round trip) followed by a "get" (another round trip).

New "key exists" or "get with default" APIs might be features in addition to a 3-return get; The 3-return get is sufficient to cover all possible use cases.

bcbrock avatar May 10 '16 17:05 bcbrock

On a side note - the only thing that sounds strange with returning 'ok' along with the value is that, it typically makes sense only if we allow storing 'nil' explicitly. In that case a user wants to differentiate between a non-existing key and a actual key with 'nil' value. However, on the other hand we plan to either not support nil or would say that we would be treating nil as an empty bytes.

That means ideally, if user gets a 'nil' during read, it's an indication of non-existing key. If he gets an empty array, the key exists.

At the level of ledger APIs this would work well. So, mostly, in our case 'OK' would be covering the issues if we encounter in protobuf layer when transferring a 'nil' to the shim side and if that happens to get converted in an empty array. (I am not sure whether we would hit into this issue based on how we are using it between handler and shim.)

manish-sethi avatar May 10 '16 17:05 manish-sethi

If I understand it correctly, protobuf is our lowest common denominator and the shim side, whether it's Java or Go should accommodate for it's short-coming (i.e. returning non-deterministic nil/empty array).

-On the ledger API side, I agree that it should return a third parameter, OK.

  • On the protobuf side, because of it's strange behavior (as illustrated by Manish), we should have a Get State struct like PutStateInfo, which includes the OK value (i.e. determining if a key is present or not)
  • On the shim side, Java or Go would look at the protobuf's GetStateInfo and provide appropriate response to it's clients : Go would return OK as an extra return parameter, where as Java may return the struct itself

Is the above correct - does that summarize it? Would that be agreeable to everyone ?

murali-katipalli-dtcc avatar May 10 '16 21:05 murali-katipalli-dtcc

I have submitted a PR https://github.com/hyperledger/fabric/pull/1471. In this PR, I have added the support for an empty array at the ledger side. Now, ledger behavior is a follows.

  • An attempt to write a 'nil' key/value results into an error. However, an empty byte array can be written.
  • GetState api returns a 'nil' if key not found, otherwise the value of the key (including an empty array) A nil check is sufficient to test whether a particular key was found or not. see method 'TestLedgerEmptyArrayValue' in ledger_test.go file.

@murali-katipalli-dtcc if you are interested in making changes for handler/shim transfer, please go ahead.

About adding a third variable at the ledger APIs level, personally I am not in the favour of this, because we are not supporting 'nil' values and a 'nil' check is sufficient for the caller to test whether key exists or not.

However, that should not stop from fixing the protobuf issue between handler and the shim. When transferring the data, we can set a flag on the handler side whether data exist or not and at the shim side appropriate action can be taken with out exposing a changed API with third value to the shim caller. Flag setting/checking would only be required if the value is an empty array/nil on the handler side; but it can also be set always. In the case of GetMultiKeys, multiple flags would need to be set.

Does that sound fine?

manish-sethi avatar May 13 '16 13:05 manish-sethi

Just to make it explicit, the flag would be required both way in the communication (setting the key and when getting the key).

manish-sethi avatar May 13 '16 13:05 manish-sethi

@manish-sethi About adding a third variable at the ledger APIs level, personally I am not in the favour of this, because we are not supporting 'nil' values and a 'nil' check is sufficient for the caller to test whether key exists or not.

Agreed as this is an internal implementation detail.

About protobuf ... are you saying a "nil" ChaincodeMessage.Payload will unmarshall into a 0 byte array in the shim ? We should try to avoid putting (any more) implementation specific fields in ChaincodeMessage if we can.

muralisrini avatar May 13 '16 16:05 muralisrini

No @muralisrini, protobuff does it other way around. It converts a field value from empty array into nil on the destination side. If you want to avoid any implementation specific field, a hack could be to pad the value with a zero byte and unpad at the receiving end.

manish-sethi avatar May 13 '16 16:05 manish-sethi

Sorry I think that my previous message was not clear. What I meant to say was that on the sending side, if the value is 'nil' it can be left as is and it would appear on the receiving side as a 'nil'. Otherwise, a byte (say, 0x0) can be padded and unpadded so that an empty byte array appears as an empty byte array on the receiving end.

manish-sethi avatar May 13 '16 17:05 manish-sethi

Ok. Basically guarantee non-nil payload would always have at least 1 byte and strip off the last 0 after the unmarshal. It is simple and avoids additional fields to ChaincodeMessage.

Comment @murali-katipalli-dtcc ?

muralisrini avatar May 13 '16 18:05 muralisrini

Agreed on the ledger API. I can take care of it on the shim side On the protobuf side - we have two options. Either artifically pad/unpad or have a specific field which determines if key exists or not?

Which route do we want to go? Pl. suggest and I can implement accordingly.

To clarify, if do a specific field for protobuf side, then we need a struct similar to PutStateInfo or have one struct StateInfo w/ a field identifying for key exists or not.

murali-katipalli-dtcc avatar May 13 '16 19:05 murali-katipalli-dtcc

@murali-katipalli-dtcc : I'd go with the pad/unpad. As you point out it is simpler (esp. if wrapped up nicely in methods on both sides).

More importantly, its an implementation detail (thanks to protobuf converting 0 bytes to nil) that we don't to expose protos.ChaincodeMessage.

muralisrini avatar May 13 '16 20:05 muralisrini

Before going with a padding approach, it is important to understand exactly what is the overhead in doing this. Both in terms of run time and garbage generated. What is the overhead?

bcbrock avatar May 13 '16 21:05 bcbrock

I think folks have a different view on this. @binhn and @mastersingh24 are not convinced for adding support for either nil/empty array. If we think that it is not important then I think, I can rollback the PR (it's not merged yet). Instead, we can simply add a check for empty array values also (as for nil) during write time. On the Shim side, it should work as it is. Does that sound reasonable?

manish-sethi avatar May 14 '16 03:05 manish-sethi

@manish-sethi @binhn @mastersingh24 I'm on the fence about allowing PutState(key, []byte{}). Could go either way.... but I do like not having to work around []byte to nil conversions.

@bcbrock on the padding approach (moot if we don't allow []byte{}) :

it would be a go "append" on oneside and a truncate on the other. I'd think negligible overhead for most cases. The pluses are simplicity and avoidance of bleeding implementation fields into protocol structures (proto.ChaincodeMessage). Are there reasonable alternatives I'm missing ?

muralisrini avatar May 14 '16 13:05 muralisrini