chore: increase async rpc performance
While developing the bigtable client, we found that async gapic clients are very slow compared to raw grpc calls. Looking at the code, I found a couple low-hanging-fruit optimizations, some just needing a port over from the synchronous client code:
- use cached wrapped functions, instead of wrapping the underlying grpc call before each rpc call
- check if the input is the expected proto type, and avoid creating a new copy if so
These changes give ~5x speed up on a quick benchmark I threw together (1.99s -> 0.428s for unary, 1.839s -> 0.335s for streams. Network mocked out. 10,000 rpc tests), and bring rpc performance more in-line with the sync clients
Ok, I added some tests that make sure that wrap_method is cached instead of being called on each rpc call. Let me know if that works
LGTM, but please wait for review from @vchudnov-g
Ok, I'm going to be out of office for the next couple weeks, but feel free to merge it while I'm gone if it looks good