skywalking icon indicating copy to clipboard operation
skywalking copied to clipboard

[Feature] Retention policy based on time range

Open hanahmily opened this issue 2 years ago • 5 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

The retention-related fields are supported in the API. But the kernel doesn't support them. We got the below requirements based on the integration and stress testing:

  • A stream needs a unique rule for a group. The minimal size should be a block's interval.
  • A measure group supports several rules. To control the file number of the server, SW's xx_minute should belong to a rule. xx_day and xx_month are routed to another rule. Each rule would generate a tsdb instance. Generally, a measure group will create two tsdb, a fine-grained and a down-sampled. The minimal size should be a block's interval as the stream.
  • Property doesn't support retention policy.
  • TopNAggregation(WIP) will derive rules from its source measure. FY @lujiajing1126

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

hanahmily avatar Aug 10 '22 06:08 hanahmily

@lujiajing1126 Have the java client and OAP banyandb storage module a chance to support grouping measures by the suffix(minute, day, and month)? We have to put measures to dedicated groups as:

  • Group named "measure-default" contains "xxx_minute"
  • Group named "measure-downsampling" contains "xxx_day" and "xxx_month"

hanahmily avatar Sep 20 '22 09:09 hanahmily

Do you mean in query stage?

wu-sheng avatar Sep 20 '22 09:09 wu-sheng

I think it may be very special case if the grouping rule is based on some string suffix.

Or we may set the rule based on measure interval as a default convention?

On Tue, Sep 20, 2022 at 5:13 PM Gao Hongtao @.***> wrote:

@lujiajing1126 https://github.com/lujiajing1126 Have the java client and OAP banyandb storage module a chance to support grouping measures by the suffix(minute, day, and month)? We have to put measures to dedicated groups as:

  • Group named "measure-default" contains "xxx_minute"
  • Group named "measure-downsampling" contains "xxx_day" and "xxx_month"

— Reply to this email directly, view it on GitHub https://github.com/apache/skywalking/issues/9453#issuecomment-1252067525, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATTAEEHWLDADXHHKF3EQ3TV7F54NANCNFSM56DH7SRQ . You are receiving this because you were mentioned.Message ID: @.***>

lujiajing1126 avatar Sep 20 '22 11:09 lujiajing1126

Do you mean in query stage?

It affects every stage: creating, writing, and querying.

we may set the rule based on measure interval as a default convention

Make sense. We could leverage the Downsampling enum to define the rules.

  • Minute is routed to measure-default as the current implementation.
  • Hour and Day goes to measure-downsampling.

I will add a new field, segment_interval in ResouceOpts as

message ResourceOpts {
    // shard_num is the number of shards
    uint32 shard_num = 1;
    // block_num specific how many blocks in a segment
    uint32 block_num = 2;
    // ttl indicates time to live, how long the data will be cached
    string ttl = 3;
   // segment_interval indicates the length of a segment  
    Duration segment_interval = 4;
}

measure-default could be set below to create a kv every 2 hours.

segment_interval: 1 Day
block_num: 12
ttl: 7 Days

measure-downsampling has below settings to create a unique kv:

segment_interval: 7 Day
block_num: 1
ttl: 7 Days

hanahmily avatar Sep 20 '22 14:09 hanahmily

Yes, downsampling should carry enough when creating and writing. And querying has the step to apply the correct parameter.

wu-sheng avatar Sep 20 '22 14:09 wu-sheng