rest-layer ETag validity on composite resource

Lets suppose we have this request:

GET http://192.168.0.103:3001/api/clients/59a85126eaba9ebd0180ac55?fields=*,services,visits
If-None-Match: "11b22a28b07ddf482f8269050e9691d9"

If I execute this in a row, first time I get 200 then 304, which is as it is supposed to be. But then I change some field values of services and visits behind the scenes.

Then retry the query above, I still get 304, although the result of the query IS CHANGED.

ETag comparison has its place on updating concrete resources, but I think we should leave generation of ETags for compound resources, and for resouseLists to be generated in a middleware function executed after rest-layer.

Oct 13 '17 16:10 Dragomir-Ivanov

Do you mean *,services{*},visits{*}? Because without {} you should just get the object ids.

This is a very valid point. What about computing a composite ETag?

cc @smyrman

Oct 14 '17 01:10 rs

@rs services and visits are bound to clients by schema.Reference and not schema.Connection so *,services,visits results in an arrays of services:[] and visits:[] attached to the returned clients item.

Composite ETag can't be cached in the DB, so we need to recompute it on the fly, compare with the supplied ETag and return 304 if they match. Is all this work related to rest-layer at all, aren't we putting too many responsibilities on it? Other frameworks will do this in a middle-ware function executed after the frameworks.

I guess if we have some context ( but not Go context.Context) exported in the request, and we put some important values there to let the following middle-ware functions make decisions based on them. I am against context.Context because it is hierarchical, and adding values in a child ctx doesn't propagate to parent ctx.

Oct 14 '17 09:10 Dragomir-Ivanov

Okay, I think now I am understanding why middle-ware path is not considered here. In Go http.ResponseWriter is single shot, and once you Write() there is no turning back. That is why Alice middle-ware can be used only for modifying the http.Request object, but not the response. Please confirm that my understanding is correct.

Oct 14 '17 17:10 Dragomir-Ivanov

Yes. That’s why you can set a custom ResponseSender on rest.Handler.

Computing a checksum of all the Etags shouldn’t be too expensive.

Oct 14 '17 18:10 rs

@rs Thanks, then custom ResponseSender is the way to go.

Computing a checksum of all the Etags shouldn’t be too expensive.

What do you mean by this? I thought the ETags are check-sums themselves. My thinking is computing ETag for composite and list resources will be the right way ( and is cheap, compared to network latency), and this might be done in the ResponseSender mentioned above. However we shouldn't enforce that, but maybe put an example for this particular case in the examples.

Oct 14 '17 18:10 Dragomir-Ivanov

Etag is an opac string. We can combine all etags of the (sub-)resources using the hash algo of our choosing to create a composite etag.

Why do you want to fix this issue outside of rest layer? It is a real problem that need to be addressed.

Oct 14 '17 19:10 rs

Well, now that it is clear we can't make post rest-layer middle-ware function to modify the response, we have no choice. I wanted because I have seen some Node frameworks doing in middle-ware functions, and it seemed self-contained problem.

Now, why would you want to combine all ETags, is it speed the only concern? Wouldn't be the same if you only calculate ETag (md5 hash) to the composite resource response? Combining multiple-ETags seems more work.

Oct 14 '17 20:10 Dragomir-Ivanov

Summing n string is less expensive than summing n maps with arbitrary number of fields (and subfields) stored as interface{}. We could checksum the JSON representation, it would be in-between in term of complexity but it would also be too late and tie the Etag computation to the representation which is handled by the rest package instead of resource package.

Functionally, I agree, both approaches are equivalent.

Oct 14 '17 21:10 rs

Just looked at the code. Accessing the Etag during projection evaluation might be tricky actually. I think query.Projection.Eval should return a list of resources used during the evaluation for instance, so the caller can do additional work. I means that Etag comparison won't be able to happen before projection evaluation.

We have the same problem with If-Modified-Since and Last-Modified actually. In this case, we should return the min(resource, sub-resources).

Oct 14 '17 21:10 rs

What summing algorithm do you propose for Etags? Append all Etags and then do final md5 on the whole byte array, or XORing them all?

Oct 20 '17 15:10 Dragomir-Ivanov

I would go for md5. The xor route would be interesting but would require to either guarantee all Etags are same length or use padding. All rest-layer Etags are currently same size but it could change in the future or we may want to support pre-existing Etags with different format.

Oct 20 '17 16:10 rs

Will the md5 hashed Etag be set in the header, while the raw MD5s are made available as _etag in each resource? Will the original Etag be available as a different header? How does the system respond if a resource is missing an E-tag, e.g. as discussed in rs/rest-layer-mongo#20.

One -case to be aware of:

A user fetches a resource with some other resource embedded GET http://192.168.0.103:3001/api/clients/59a85126eaba9ebd0180ac55?fields=*,services,visit
A user try to update the root resource if it has not changed PUT http://192.168.0.103:3001/api/clients/59a85126eaba9ebd0180ac55 If-Match X (X retrieved via 1)

Oct 20 '17 23:10 smyrman

One solution would be to have a composite Etag with a separator like <root resource etag>-<composite projection etags>. The full Etag would be used by browser for caching, but we could strip the second part when the item is edited.

Oct 20 '17 23:10 rs

I think it is best if each resource brings its own _etag field just like in the itemList case. Then we will have ETags to update freely the root resource or each of the projected resources. I guess we will need to supply _updated as well. Also when PUT/POST-ing a resource, we can look for If-Match HTTP header, if non we can fallback to trying to find _etag field within the resource, and trying to use that. This will not help browser caching, but will prevent concurrent resource update, which is more important.

Oct 21 '17 08:10 Dragomir-Ivanov

I’m not a fan of poluting the payload with metadata.

Oct 21 '17 09:10 rs

Okay, but we are polluting the payload in itemList case, aren't we? Adding _etag in the referenced resources as well would service the same purpose. Pollution of every kind is not wanted, but can't we make this configurable, or better supplying enough of the data, so custom ResponseFormatter will do the job.

Oct 21 '17 12:10 Dragomir-Ivanov

What’s wrong with the composite etag described above?

Oct 21 '17 17:10 rs

There is one particular use-case, where you get root resource, and a referenced resource item/items. Then you want to PUT/PATCH a referenced resource item. In your proposal, we can't do that, because we don't have the ETag for this individual item, but just for the root resource and for all projected resource items Etag combined sum. If we want to update a referenced resource item then we will need to GET it again, then PUT. Not needing doing this is nice optimization, and will be in line of itemList case, where we have array of items, each with _etag field. In the root resource we also have array of referenced resources items, but without _etag field..

I understand that all this is optional, so maybe the best place to do this formatting is in the ResponseFormatter so the user can overwrite it.

Oct 22 '17 10:10 Dragomir-Ivanov

I suppose we have three or four cases for the composite e-tag:

list view : only composite etag ("root item?")
list view with embeds : another composite etag
detail view : only root item etag
detail view with embeds : root item and conposite e-tag

I suppose this will let you update the root-resource without a data race, but not an embedded sub-resource, e.g. if you want to fetch all resources through one big request with embeds, and update individual resources later without a re-fetch.

I’m not a fan of poluting the payload with metadata.

Neither am I. However, as @Dragomir-Ivanov points out, the rest package already include an "_etag" field in some cases. Why not do so consistently, and let that field be used for updates, and the header value E-tag for GET?

Oct 22 '17 10:10 smyrman

Btw, making the header values e-tag of format <root resource etag>-<composite projection etags> when applicable is still nice in order to make it less confusing to the user why the E-tag in the header and payload sometimes do not match; at least he can see that the header E-tag actually includes the resource E-tag, and that it's of a longer format.

Oct 22 '17 10:10 smyrman

@rs Any thoughts on "polluting" the sub-resource payload with _etag?

Oct 25 '17 09:10 Dragomir-Ivanov

I have no better solution to propose so let’s do that :/

Oct 25 '17 14:10 rs

I was tinkering with the Etag for the composite resource, and it turned out that during the resource and sub-resource projection evaluation different parts of sub-resource projections can come from DB at different times, because some sub-resources are obtained with go-routines. This is a problem because for the same request, sometimes sub-resources can come in A,B,C order, but for other times they can come in B,C,A order, thus using md5.Hash() function producting different composite resource hasg/ETag, for exactly the same result. @rs Can you rethink your decision on using XOR of all ETags instead. It will produce the same result no matter the order.

Oct 25 '17 18:10 Dragomir-Ivanov

Can you rethink your decision on using XOR of all ETags instead.

I suppose that means that rs/rest-layer-mongo#20 would need to change to a compatible format?

sometimes sub-resources can come in A,B,C order, but for other times they can come in B,C,A order,

Have you actually observed this? From me just skimming the code, it looks like the order is pretty much guaranteed (unless they come in a random order from the Storer layer).

Expanded references are inserted at a given map field location, so evaluation order should not matter.

https://github.com/rs/rest-layer/blob/2c4c847a7e4d13e1245c93990665aaeb1a35cfe2/schema/query/projection_evaluator.go#L95

Connections are inserted at a given slice index:

https://github.com/rs/rest-layer/blob/2c4c847a7e4d13e1245c93990665aaeb1a35cfe2/schema/query/projection_evaluator.go#L121

Oct 25 '17 20:10 smyrman

I suppose it is true that they can come from the DB at different times, but does that matter?

Oct 25 '17 20:10 smyrman

Maybe I am not understanding the code enough, but for composite resource's ETag we will need to hash with MD5: root resource ETag, sub-resource A ETag, sub-resource B ETag, etc. We will need to apply the MD5 hash in the same order, for us to have the same composite-resource ETag. We will need ether store&compute this hash after all sub-resources have been obtained from storage, or use algorithm where ordering doesn't matter (XOR). @smyrman The snippets above seem to be executed by separate go-routines, so there is no guarantees about their scheduling, and which finishes when. I may be wrong, but that is my understanding for now.

Oct 25 '17 20:10 Dragomir-Ivanov

We will need ether store&compute this hash after all sub-resources have been obtained from storage, or use algorithm where ordering doesn't matter (XOR).

Ok, I follow you now. I was assuming you would do the first, compute the hash after all sub-resources have been fetched. If you want to do the incremental hash while you are fetching resources, then yes, you would need to use another algorithm than MD5.

Oct 25 '17 20:10 smyrman

From me just skimming the code, it looks like the order is pretty much guaranteed (unless they come in a random order from the Storer layer).

I was thinking about the order being consistent in the final JSON response, not in terms of evaluation order, which is what you ask for -- sorry for the confusion.

Oct 25 '17 20:10 smyrman

In the light of this, xor is the way to go. We can handle different sizes using padding when necessary.

Oct 26 '17 00:10 rs

We can handle different sizes using padding when necessary.

That only works if the non-conforment ID has a length shorter then the typical MD5 sub. Maybe better to just do a MD5-sum of any non-conferment IDs (length not == 32 chars) right before the XOR?

Oct 26 '17 08:10 smyrman

rest-layer rest-layer copied to clipboard

ETag validity on composite resource

rest-layer
rest-layer copied to clipboard