sync_gateway Refactor document code, and avoid [un]marshaling doc bodies

Refactor document code, and avoid [un]marshaling doc bodies

Open snej opened this issue 1 year ago • 4 comments

CBG-2951

Moved most of the Document, Revision and RevTree code, and a lot of Attachment stuff, out of the db package into a new document package. (I'm not wild about that name, but doc would conflict with the very common parameter name doc, and docs is already taken as a directory.) Two reasons for this:

Creates a privacy boundary and lets these structs, esp. Document, protect their private fields from being groped by other code in the db package. There's a fair bit of this going on with the _body and _rawBody fields.)
The db package is really big and it would be nice to have smaller packages, IMHO, just for readability and code navigation.

To reduce the number of diffs, I created a file db/doc_model.go that creates aliases for many of the types and constants that were moved. I also added an alias for the function ParseRevID because there are so damn many calls to it.

To cope with the code in db that touched Document's _body and _rawBody fields, I added methods PeekBody, PokeBody, PeekRawBody, PokeRawBody. These can be removed after the code that calls them is fixed.

There are no functional changes in this PR; everything should behave exactly as before.

Pre-review checklist

[x] Removed debug logging (fmt.Print, log.Print, ...)
[x] Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
[x] Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

[x] Link upstream PRs
[x] Update Go module dependencies when merged

Integration Tests

[ ] GSI=true,xattrs=true https://jenkins.sgwdev.com/job/SyncGateway-Integration/000/

Feb 22 '23 23:02 snej

@snej You indicate that there are no functional changes in this PR - how does that apply to the changes around unmarshalling document bodies? Can you summarize the changes being made to that functionality?

May 16 '23 22:05 adamcfraser

@adamcfraser In this PR the document bodies are still unmarshaled when passed to JS (Otto). That will change with the V8 PR, at least when V8 is enabled; there we pass the JSON as a string and unmarshal it in JavaScript which is more efficient.

Otherwise, incoming doc bodies are still validated to make sure they don't contain illegal/reserved properties, but that's done without unmarshaling. The JSON does get parsed, but without converting it to Go objects which is the expensive part.

May 16 '23 22:05 snej

Created a perf testing task CBPS-1176 for this.

Jun 05 '23 17:06 snej

Perf results show a decent ~5% improvement in throughput (and accompanying reduction in CPU) - we should go ahead and review/merge based on this.

Jul 11 '23 17:07 adamcfraser

sync_gateway sync_gateway copied to clipboard

Refactor document code, and avoid [un]marshaling doc bodies

Pre-review checklist

Dependencies (if applicable)

Integration Tests

sync_gateway
sync_gateway copied to clipboard