[BUG] Unexpected Cache Behavior on ASTRA_MANAGER_REPLICA_LIFESPAN_MINS Update
Describe the bug
It seems that when the configuration for replicaCreationServiceConfig.replicaLifespanMins (e.g., ASTRA_MANAGER_REPLICA_LIFESPAN_MINS) is updated, existing replicas do not reflect the new value. This behavior is unexpected, as it differs from what I anticipated for cache updates.
Requirements (place an x in each of the [ ])**
- [x] I've read and understood the Contributing guidelines and have done my best effort to follow them.
- [x] I've read and agree to the Code of Conduct.
- [x] I've searched for any related issues and avoided creating a duplicate issue.
To Reproduce
- Set
ASTRA_MANAGER_REPLICA_LIFESPAN_MINSto a high value (e.g., 7 days). - Keep the cluster running for 7 or more days
- Reduce
ASTRA_MANAGER_REPLICA_LIFESPAN_MINSto a lower value (e.g., 24 hours). - Query nodes still serve data from the original 7-day window and require cache capacity to accommodate the older data.
Observations
- When a snapshot is created by the index node, the associated record in ZooKeeper reflects the value of
ASTRA_MANAGER_REPLICA_LIFESPAN_MINSat the time of creation. - Subsequent updates to this configuration do not appear to impact existing replicas.
Expected behavior
If the lifespan is increased, I would expect the system to pull additional data from S3. Conversely, if it is decreased, I would expect the system to limit the data served to align with the reduced window.
Questions and Suggestions
I understand that caching logic is undergoing changes. Will the new implementation allow for the cache window to adapt more immediately following a configuration update? This could be particularly useful for occasional scenarios where serving older data is necessary. For example:
- Normally, you might only require 3-7 days of data, but you keep segments in S3 for longer.
- By temporarily increasing
ASTRA_MANAGER_REPLICA_LIFESPAN_MINSand scaling up cache capacity, you could serve older data as needed. - Afterward, scaling down the cache and resetting the configuration would return the system to its usual state. Currently, this flexibility does not seem possible due to the described behavior. Let me know if I can provide any additional details or run further tests to assist in diagnosing this issue.
Thank you!
Screenshots
If applicable, add screenshots to help explain your problem.
Reproducible in:
Astra version: We are running a slightly older version of astra. We are built off of https://github.com/airbnb/kaldb but I don't see any PRs that change this behavior since then.
JVM version:
OS version(s):
Additional context
Add any other context about the problem here.