htsget-rs icon indicating copy to clipboard operation
htsget-rs copied to clipboard

Consider caching expensive functions

Open mmalenic opened this issue 3 years ago • 4 comments
trafficstars

We should consider caching expensive computations, or functions that download lots of data. For example, we could cache s3 get object requests, or computations involving searching through entire index files. This could be done using Rust's cached crate, and may improve performance on successive queries with similar parameters.

mmalenic avatar Aug 03 '22 02:08 mmalenic

Worth reviewing https://github.com/samtools/hts-specs/pull/325 before tackling it.

Slightly different topic though as I guess your focus here is more about memoization instead of network/query response payload caching?

brainstorm avatar Aug 03 '22 06:08 brainstorm

We could definitely explore http caching. Maybe some data could even be cached client side? I think the main benefit of caching is to reduce any delays between the lambda functions and aws s3, in the htsget-http-lambda crate, although there might be an aws-specific solution for this.

The htsget-http-actix crate already has access to the file system so it's not as big a deal there. Although, both crates would benefit from memoization of expensive functions.

mmalenic avatar Aug 03 '22 23:08 mmalenic

Before introducing those I would definitely profile first, @victorskl can help you out with AWS XRays profiling tools. Local profiling can be done with cargo-instruments, flamegraphs et al.

brainstorm avatar Aug 10 '22 00:08 brainstorm

Also, take into account cache headers from S3, just bumped into this: https://www.sam.today/blog/always-set-cache-control

brainstorm avatar Sep 27 '23 10:09 brainstorm