riskloc icon indicating copy to clipboard operation
riskloc copied to clipboard

About the forecasts

Open robbyyuu opened this issue 1 year ago • 3 comments

Thanks for your excellent work! It really helped me a lot.

Recently, I have also been focusing on issues in this area (Multi-dimensional Root Causes Analysis), and as you mentioned:

In practice, I found that the most difficult step is to get accurate forecasting values for all leaf elements. Since these are usually quite fine-grained, they don't actually have much data and any forecasts are often inaccurate. This can skew the results.

I've also found it very difficult to get the forecast value of all leaves, in particular, certain combinations have only a small number of values or are almost 0, is there any suitable forecasting method worth recommending in this case?

Or have you tried using the RiskLoc algorithm in a real industrial scenario, and if so, can you share what forecasting method you used in this case?

robbyyuu avatar Apr 11 '23 08:04 robbyyuu

Hello,

Thank you very much for your interest!

Regarding the forecast method, I still do not have any good solution when most values are zero. The approach I suggest is to make the data more course-grained by considering longer timespans or group multiple dimensional values together for some selected dimensions (preferably the dimensions with more unique values). I found this to be the most reliable way to handle the problem with mostly zeros in the forecasts.

As long as this issue is handled then any common time series forecasting algorithm that gives reasonable predictions should be ok, even something simple such as using a moving average.

shaido987 avatar Apr 12 '23 12:04 shaido987

Thanks for your patient reply, I also think aggregating and bucketing some long-tail features is a good solution, I'll try it on my task.

In addition, I found that a distribution called Tweedie distribution may be useful for forecasting problems in this scenario, which was used in the M5 time series forecasting competition.

At the same time, I found that RiskLoc often finds root causes in deeper layers, and in my tasks I usually focus more on some shallow root causes, such as those in layer 1 cuboids. Is there a way to make RiskLoc pay more attention to first layer cuboids?

robbyyuu avatar Apr 12 '23 12:04 robbyyuu

For RiskLoc, it tries to find as a good explanation for the error as possible so if there is no good plausible root cause first in the first layers it will try digging deeper. Something you can try is increasing the pep-threshold to filter away smaller, less important elements. You can also slightly lower the risk threshold to accept less perfect root causes (which will be in higher layers).

However, if you know that the root cause is in the first layer then using Adtributor could also be very interesting to try out as it will not consider any elements in lower layers. It usually handles simple, first layer root causes very well.

shaido987 avatar Apr 15 '23 14:04 shaido987