gcp-gradle-build-cache
gcp-gradle-build-cache copied to clipboard
Use piped streams to avoid loading cache entry bytes into memory
Currently the Build Cache implementations load the build cache entries into memory as a ByteArray
- https://github.com/androidx/gcp-gradle-build-cache/blob/6ff65738c57f787299fbcba2051b831cfc4b5bac/gcpbuildcache/src/main/kotlin/androidx/build/gradle/gcpbuildcache/GcpBuildCacheService.kt#L70-L74
- https://github.com/androidx/gcp-gradle-build-cache/blob/6ff65738c57f787299fbcba2051b831cfc4b5bac/s3buildcache/src/main/kotlin/androidx/build/gradle/s3buildcache/S3BuildCacheService.kt#L72-L76
I believe this will negatively impact performance (although I admit I haven't done any testing, so I could be wrong!).
It can be avoided by piping the streams. For example:
override fun store(key: BuildCacheKey, writer: BuildCacheEntryWriter) {
// ...
val incoming = PipedOutputStream()
writer.writeTo(incoming)
val contents = PipedInputStream(incoming)
storageService.store(cacheKey, contents, writer.size) // must manually pass the size down
}
I'd be happy to contribute a PR.
If you have some benchmarks showing this helps, i'm happy to take a PR.
I've done some experimenting (see https://github.com/aSemy/gcp-gradle-build-cache/tree/experiments/input-streams), and used JMH on the S3 bucket, testing three options:
- Using ByteArray (the current version)
- Using PipedInputStream/PipedInputStream (which requires an additional thread)
- Using an Okio Buffer
I only measured performance (operators-per-second), not memory usage. I'd like to measure the memory usage, but kotlinx.benchmark doesn't support profiler arguments yet.
tl;dr: Buffer is slowest. Piped streams are on average faster, but the speed is inconsistent. ByteArray is slower than Piped streams, but more consistent.
Benchmark (mode) Mode Cnt Score Error Units
AwsBenchmark.storeRandomData byte-array thrpt 10 50,267 ± 29,572 ops/s
AwsBenchmark.storeRandomData piped thrpt 10 64,742 ± 158,550 ops/s
AwsBenchmark.storeRandomData buffer thrpt 10 34,091 ± 19,698 ops/s
2024-03-21T13.01.18.356566 main.json
Based on this, I'd probably not move to Piped streams. However, it might be worth investigating concurrency.