grpc-go
grpc-go copied to clipboard
If a priority contains multiple localities with pick_first, load is reported incorrectly
This should resolve when dual stack is fully implemented (#6472). Until then, a partial implementation should be possible without requiring much throwaway work.
- [x] Create a reproduction test case for this scenario and confirm that load is not reported correctly.
- [ ] Change pick_first to have one subchannel per address. Probably skip implementing happy eyeballs for now.
- [ ] Set an attribute on every subchannel creation operation (for outlier detection to read).
- [ ] Disable outlier detection if pick_first is used (detected via address attribute). (See C++ version of this change here: https://github.com/grpc/grpc/pull/33336)
That should be all we need to correct this issue.
@apolcyn noticed that if we simply changed the locality ID to use the first address instead of the last, we'd most likely improve the accuracy (since the first address is more likely to be connected to than the last):
http://google3/third_party/golang/grpc/xds/internal/balancer/clusterimpl/clusterimpl.go;l=364;rcl=634425291
Please do not change the locality ID to always use the first address. Reported load needs to be 100% accurate, especially in cases where we're failing over to different backends.
The above comment was suggested as a short-term workaround that would take a few minutes to accomplish while we await the full fix.
Another idea for a short-term fix that will actually get us to 100% correctness but not take as long to implement (hopefully):
- Add an unexported field to
balancer.SubConnStatethat, when theConnectivityStateisREADY, contains the connected address. - Add accessor function
vars for this field ininternalwhich thebalancerpackage assigns to local functions that perform the actions. - Set this in the channel around here, indirectly.
- Read this in the
StateListener(here) and callupdateLocalityID()based on the connected address's locality. - Add a test to verify.
This works better than adding an unexported method to the SubConn, because the subconn wrapping requires propagating the method everywhere.