rest-layer
rest-layer copied to clipboard
ETag validity on composite resource
Lets suppose we have this request:
GET http://192.168.0.103:3001/api/clients/59a85126eaba9ebd0180ac55?fields=*,services,visits
If-None-Match: "11b22a28b07ddf482f8269050e9691d9"
If I execute this in a row, first time I get 200 then 304, which is as it is supposed to be.
But then I change some field values of services and visits behind the scenes.
Then retry the query above, I still get 304, although the result of the query IS CHANGED.
ETag comparison has its place on updating concrete resources, but I think we should leave generation of ETags for compound resources, and for resouseLists to be generated in a middleware function executed after rest-layer.
Do you mean *,services{*},visits{*}? Because without {} you should just get the object ids.
This is a very valid point. What about computing a composite ETag?
cc @smyrman
@rs services and visits are bound to clients by schema.Reference and not schema.Connection so *,services,visits results in an arrays of services:[] and visits:[] attached to the returned clients item.
Composite ETag can't be cached in the DB, so we need to recompute it on the fly, compare with the supplied ETag and return 304 if they match.
Is all this work related to rest-layer at all, aren't we putting too many responsibilities on it? Other frameworks will do this in a middle-ware function executed after the frameworks.
I guess if we have some context ( but not Go context.Context) exported in the request, and we put some important values there to let the following middle-ware functions make decisions based on them.
I am against context.Context because it is hierarchical, and adding values in a child ctx doesn't propagate to parent ctx.
Okay, I think now I am understanding why middle-ware path is not considered here. In Go http.ResponseWriter is single shot, and once you Write() there is no turning back. That is why Alice middle-ware can be used only for modifying the http.Request object, but not the response.
Please confirm that my understanding is correct.
Yes. That’s why you can set a custom ResponseSender on rest.Handler.
Computing a checksum of all the Etags shouldn’t be too expensive.
@rs Thanks, then custom ResponseSender is the way to go.
Computing a checksum of all the Etags shouldn’t be too expensive.
What do you mean by this? I thought the ETags are check-sums themselves. My thinking is computing ETag for composite and list resources will be the right way ( and is cheap, compared to network latency), and this might be done in the ResponseSender mentioned above. However we shouldn't enforce that, but maybe put an example for this particular case in the examples.
Etag is an opac string. We can combine all etags of the (sub-)resources using the hash algo of our choosing to create a composite etag.
Why do you want to fix this issue outside of rest layer? It is a real problem that need to be addressed.
Well, now that it is clear we can't make post rest-layer middle-ware function to modify the response, we have no choice. I wanted because I have seen some Node frameworks doing in middle-ware functions, and it seemed self-contained problem.
Now, why would you want to combine all ETags, is it speed the only concern? Wouldn't be the same if you only calculate ETag (md5 hash) to the composite resource response? Combining multiple-ETags seems more work.
Summing n string is less expensive than summing n maps with arbitrary number of fields (and subfields) stored as interface{}. We could checksum the JSON representation, it would be in-between in term of complexity but it would also be too late and tie the Etag computation to the representation which is handled by the rest package instead of resource package.
Functionally, I agree, both approaches are equivalent.
Just looked at the code. Accessing the Etag during projection evaluation might be tricky actually. I think query.Projection.Eval should return a list of resources used during the evaluation for instance, so the caller can do additional work. I means that Etag comparison won't be able to happen before projection evaluation.
We have the same problem with If-Modified-Since and Last-Modified actually. In this case, we should return the min(resource, sub-resources).
What summing algorithm do you propose for Etags? Append all Etags and then do final md5 on the whole byte array, or XORing them all?
I would go for md5. The xor route would be interesting but would require to either guarantee all Etags are same length or use padding. All rest-layer Etags are currently same size but it could change in the future or we may want to support pre-existing Etags with different format.
Will the md5 hashed Etag be set in the header, while the raw MD5s are made available as _etag in each resource? Will the original Etag be available as a different header? How does the system respond if a resource is missing an E-tag, e.g. as discussed in rs/rest-layer-mongo#20.
One -case to be aware of:
- A user fetches a resource with some other resource embedded
GET http://192.168.0.103:3001/api/clients/59a85126eaba9ebd0180ac55?fields=*,services,visit - A user try to update the root resource if it has not changed
PUT http://192.168.0.103:3001/api/clients/59a85126eaba9ebd0180ac55 If-Match X(X retrieved via 1)
One solution would be to have a composite Etag with a separator like <root resource etag>-<composite projection etags>. The full Etag would be used by browser for caching, but we could strip the second part when the item is edited.
I think it is best if each resource brings its own _etag field just like in the itemList case.
Then we will have ETags to update freely the root resource or each of the projected resources.
I guess we will need to supply _updated as well.
Also when PUT/POST-ing a resource, we can look for If-Match HTTP header, if non we can fallback to trying to find _etag field within the resource, and trying to use that. This will not help browser caching, but will prevent concurrent resource update, which is more important.
I’m not a fan of poluting the payload with metadata.
Okay, but we are polluting the payload in itemList case, aren't we? Adding _etag in the referenced resources as well would service the same purpose. Pollution of every kind is not wanted, but can't we make this configurable, or better supplying enough of the data, so custom ResponseFormatter will do the job.
What’s wrong with the composite etag described above?
There is one particular use-case, where you get root resource, and a referenced resource item/items. Then you want to PUT/PATCH a referenced resource item. In your proposal, we can't do that, because we don't have the ETag for this individual item, but just for the root resource and for all projected resource items Etag combined sum.
If we want to update a referenced resource item then we will need to GET it again, then PUT. Not needing doing this is nice optimization, and will be in line of itemList case, where we have array of items, each with _etag field. In the root resource we also have array of referenced resources items, but without _etag field..
I understand that all this is optional, so maybe the best place to do this formatting is in the ResponseFormatter so the user can overwrite it.
I suppose we have three or four cases for the composite e-tag:
- list view : only composite etag ("root item?")
- list view with embeds : another composite etag
- detail view : only root item etag
- detail view with embeds : root item and conposite e-tag
I suppose this will let you update the root-resource without a data race, but not an embedded sub-resource, e.g. if you want to fetch all resources through one big request with embeds, and update individual resources later without a re-fetch.
I’m not a fan of poluting the payload with metadata.
Neither am I. However, as @Dragomir-Ivanov points out, the rest package already include an "_etag" field in some cases. Why not do so consistently, and let that field be used for updates, and the header value E-tag for GET?
Btw, making the header values e-tag of format <root resource etag>-<composite projection etags> when applicable is still nice in order to make it less confusing to the user why the E-tag in the header and payload sometimes do not match; at least he can see that the header E-tag actually includes the resource E-tag, and that it's of a longer format.
@rs Any thoughts on "polluting" the sub-resource payload with _etag?
I have no better solution to propose so let’s do that :/
I was tinkering with the Etag for the composite resource, and it turned out that during the resource and sub-resource projection evaluation different parts of sub-resource projections can come from DB at different times, because some sub-resources are obtained with go-routines. This is a problem because for the same request, sometimes sub-resources can come in A,B,C order, but for other times they can come in B,C,A order, thus using md5.Hash() function producting different composite resource hasg/ETag, for exactly the same result. @rs Can you rethink your decision on using XOR of all ETags instead. It will produce the same result no matter the order.
Can you rethink your decision on using XOR of all ETags instead.
I suppose that means that rs/rest-layer-mongo#20 would need to change to a compatible format?
sometimes sub-resources can come in A,B,C order, but for other times they can come in B,C,A order,
Have you actually observed this? From me just skimming the code, it looks like the order is pretty much guaranteed (unless they come in a random order from the Storer layer).
Expanded references are inserted at a given map field location, so evaluation order should not matter.
- https://github.com/rs/rest-layer/blob/2c4c847a7e4d13e1245c93990665aaeb1a35cfe2/schema/query/projection_evaluator.go#L95
Connections are inserted at a given slice index:
- https://github.com/rs/rest-layer/blob/2c4c847a7e4d13e1245c93990665aaeb1a35cfe2/schema/query/projection_evaluator.go#L121
I suppose it is true that they can come from the DB at different times, but does that matter?
Maybe I am not understanding the code enough, but for composite resource's ETag we will need to hash with MD5: root resource ETag, sub-resource A ETag, sub-resource B ETag, etc. We will need to apply the MD5 hash in the same order, for us to have the same composite-resource ETag. We will need ether store&compute this hash after all sub-resources have been obtained from storage, or use algorithm where ordering doesn't matter (XOR). @smyrman The snippets above seem to be executed by separate go-routines, so there is no guarantees about their scheduling, and which finishes when. I may be wrong, but that is my understanding for now.
We will need ether store&compute this hash after all sub-resources have been obtained from storage, or use algorithm where ordering doesn't matter (XOR).
Ok, I follow you now. I was assuming you would do the first, compute the hash after all sub-resources have been fetched. If you want to do the incremental hash while you are fetching resources, then yes, you would need to use another algorithm than MD5.
From me just skimming the code, it looks like the order is pretty much guaranteed (unless they come in a random order from the Storer layer).
I was thinking about the order being consistent in the final JSON response, not in terms of evaluation order, which is what you ask for -- sorry for the confusion.
In the light of this, xor is the way to go. We can handle different sizes using padding when necessary.
We can handle different sizes using padding when necessary.
That only works if the non-conforment ID has a length shorter then the typical MD5 sub. Maybe better to just do a MD5-sum of any non-conferment IDs (length not == 32 chars) right before the XOR?