jetstream
jetstream copied to clipboard
Filter data sources based on partitions and clustering where possible
A significant cost saving can be achieved by querying data sources only for relevant partitions or clusters. One such example is to query main_v4
only for nightly
data for every experiment that is run on nightly. Even more cost can be saved when querying events
and making sure only event_category
s with relevant data get queried.
Currently, these optimizations need to be made manually in custom configs. Normal users are not familiar with this, so it would be good if there was some kind of automated or more guided way that could be provided.
Cost savings are quite significant here (often cost can be cut by up to 10x)
┆Issue is synchronized with this Jira Task