htsget-rs
htsget-rs copied to clipboard
Consider caching expensive functions
We should consider caching expensive computations, or functions that download lots of data. For example, we could cache s3 get object requests, or computations involving searching through entire index files. This could be done using Rust's cached crate, and may improve performance on successive queries with similar parameters.
Worth reviewing https://github.com/samtools/hts-specs/pull/325 before tackling it.
Slightly different topic though as I guess your focus here is more about memoization instead of network/query response payload caching?
We could definitely explore http caching. Maybe some data could even be cached client side? I think the main benefit of caching is to reduce any delays between the lambda functions and aws s3, in the htsget-http-lambda crate, although there might be an aws-specific solution for this.
The htsget-http-actix crate already has access to the file system so it's not as big a deal there. Although, both crates would benefit from memoization of expensive functions.
Before introducing those I would definitely profile first, @victorskl can help you out with AWS XRays profiling tools. Local profiling can be done with cargo-instruments, flamegraphs et al.
Also, take into account cache headers from S3, just bumped into this: https://www.sam.today/blog/always-set-cache-control