hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Improve performance of GET requests

Open moio opened this issue 1 year ago • 1 comments

👋 hamilton maintainers!

I am opening this to ask whether it would make sense to include some new code, or patch existing code, in order to improve resource consumption of the Get method - particularly memory usage but also CPU.

I came to this problem as part of my day work at SUSE for Rancher, which uses hamilton to implement Entra ID (formerly Azure AD) integration. In some high-scale use tests we have users with principals in the thousands, groups in the millions and group assignments per principal in the hundreds - and enough traffic to need multiple (tens of) goroutines to periodically refresh all these objects in parallel. In extreme cases we have seen hundreds of MBs to GiBs worth of heap objects, which tend to cause out-of-memory situations.

Here is an example of a similar flame graph: image

Of course our usage pattern isn't typical, and we are working at ways to change the way we interact with the library. Nevertheless according to my analysis Get (and code called by it) does quite a bit of needless computation:

  • OData json.Unmarshal calls are amplified by a factor of 2 due to the error handling in OData.UnmarshalJSON
  • FromResponse calls, which drive OData.UnmarshalJSON calls, are amplified by a factor of 2 by performRequest
  • Get needlessly re-unmarshal responses once after FasterResponse (again with the amplification factors above), and especially
  • Get un-marshals and subsequently re-marshals responses if they are paginated. This unmarshaling-marshaling extra round is repeated by the number of pages

This PR cuts down unmarshaling to one pass only (divided in sub-passes via json.RawMessage) and eliminates remarshaling altogether.

I could not find ways to achieve this without altering the OData struct and structs/methods which use it (ConsistencyFailureFunc, ValidStatusFunc, GetHttpRequestInput), therefore, I opted for a completely new code path in fasterclient.go. The approach can be changed if API changes are acceptable.

For now and for the sake of discussion, I only patched the one endpoint that really hurts my use case: ListGroupMemberships. Changes are minimal other than using the FasterGet method in place of Get - ideally this could be extended to all other (internal) usages of Get.

I am also contributing code to exercise ListGroupMemberships more thoroughly so that the effects of the patch are evident. Furthermore I added a utility method that will output CPU and memory usage during the test run.

My results are as follows:

main faster_get
Run 1 610 ms 273 ms
Run 2 817 ms 122 ms
Run 3 643 ms 353 ms
Average 690 ms 249 ms

CPU load is ~36.14% of the original

main faster_get
Run 1 21 MiB 3 MiB
Run 2 22 MiB 3 MiB
Run 3 23 MiB 3 MiB
Average 22 MiB 3 MiB

Memory consumption is ~13.64% of the original

Is there any interest to merge such an improvement? What would be needed to have a similar PR accepted?

Thanks in advance, I hope this helps

moio avatar Jan 22 '24 12:01 moio

@manicminer do you have any impressions to share about this work?

moio avatar Feb 23 '24 09:02 moio