skywalking
skywalking copied to clipboard
[Bug] Incorrect Timestamp used in BanyanDB
Search before asking
- [X] I had searched in the issues and found no similar issues.
Apache SkyWalking Component
BanyanDB (apache/skywalking-banyandb)
What happened
BanyanDB used the measure/stream receiving timestamp as its timestamp. When in the query stage, there are several cases, in which you can't read the data correctly.
- OAP has 20-30s delay to build bulk data, so when do T1-T10 metrics query, you always can't T10 metrics correctly. Because the latest T10 value is updated in T10 + N seconds
- Similar to <1>, the traces and logs have a delay in persistent. So, you always can't get the result correctly. And more importantly, the UI is showing the start time of the traces/logs, you would see traces and logs not in the query condition. This would be treated as bugs.
What you expected to happen
We should use OAP persistent kernel to assign the timestamp for persistent, rather than using a timestamp by default.
How to reproduce
Generate a trace/metric/log, and put them into the BanyanDB several minutes later, but query in the trace running duration.
Anything else
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
BanyanDB leverage the below timestamp fields defined in the writing spec to identify the data.
- The metrics use a "time bucket" to set this field
- Trace segment's timestamp should be picked up.
We need more investigations on this issue.
There are at least two conclusions based on today's discussion
- Timestamp should be added as a field of Record in the OAP. All record implementation should provide an accurate timestamp. @wankai123
- BanyanDB server should maintain a snowflake-style ID generator. A record ID should constitute by
record name
+timestamp
(not time bucket) + server-generated suffix. The server-generated suffix should be held for every sharding ID in every several seconds/minutes(time window).
The reason for setting the rolling time bucket is to avoid conflict when a new record carries an old timestamp, and the suffix ID just rolls back to 0.
- Trace ID-1, Service A, Timestamp-1, suffix-0
- Trace ID-2, Service A, Timestamp-2, suffix-1
- Trace ID-3, Service A, Timestamp-3, suffix-2 ...
- Trace ID-1000, Service A, Timestamp-1000, suffix-1000
- Trace ID-1001, Service A, Timestamp-1, suffix-0
Trace ID-1
would be overridden by Trace ID-1001
unexpectedly. The suffix generator should make sure it would not roll back to zero in the given time window. Also, the time window should not be too small to make too much payload to hold many generators.
Now, the column-based engine has fixed the problem mentioned differently:
The Measure engine now assigns a version number to each part, so every bulk flush of OAP will be identified by this version. The version number will be larger for T10 than T1. In case of any duplicate data points, T10 will take precedence and T1 will be ignored. The deduplication process takes place during the querying and merging process.
The stream engine now has the capability to store duplicated data. If two elements with identical timestamps are present in the same series, they will both be saved and can be retrieved by the client. Currently, we do not support any deduplication technology (although some other database systems do offer this feature during querying). However, we may consider adding support for it in the future.
@wu-sheng Would you review the solution?
@hanahmily I think it is fine to pick the latest data for a specific data point. Is this solution majorly for measure data? The repeatable write for a certain data point should only happen for measure data, right?
@hanahmily I think it is fine to pick the latest data for a specific data point. Is this solution majorly for measure data? The repeatable write for a certain data point should only happen for measure data, right?
Exactly. We only deduplicate data for the measure for now. FWIK, the repeatable data only happens in the measure data flushing in the same time bucket.
Since the column-based engine is merged, I closed this issue.