nixtla fix: Chronos inference in foundation ts arena

fix: Chronos inference in foundation ts arena

Open abdulfatir opened this issue 1 year ago • 6 comments

trafficstars

Thank you for evaluating Chronos again. It's great to see it performing accurately on this benchmark as well.

We found some problems with the way inference is being done for Chronos:

Excess NaN padding was being applied to short time series which is not required and would slow down the model significantly.
The original time series were being casted to bfloat16 which results in loss of information and may lead to poor accuracy.

This PR fixes these issues. The following table shows a comparison of Chronos (Large)'s performance before (taken from the original table in this repo) and after these fixes, and also reports the performance of other variants of Chronos. These experiments were performed on a g5.4xlarge instance, as in the original study.

	Accuracy				Inference Time
	Monthly	Weekly	Daily	Hourly	Monthly	Weekly	Daily	Hourly
Chronos-Large (Before)	0.960	0.709	0.652	0.735	38.581	5.081	7.908	11.662
Chronos-Large	0.950	0.704	0.652	0.654	5.402	5.054	7.882	11.500
Chronos-Base	0.966	0.709	0.663	0.646	1.966	1.712	2.940	4.714
Chronos-Small	0.982	0.724	0.669	0.671	0.689	0.550	0.986	1.818
Chronos-Mini	0.968	0.736	0.682	0.729	0.476	0.356	0.688	1.371
Chronos-Tiny	0.976	0.765	0.686	0.799	0.316	0.212	0.427	0.965

We observe:

improvements in the MASE for Monthly (~1%) and Hourly (~11%) datasets.
a significant improvement (~38mins to ~5mins) in the inference time for the Monthly subset which has many very short time series.
smaller Chronos models provide a quality-speed trade-off with the Base model performing almost as well as Large while being much faster, and even the mini model performing better than most baselines in the original study.

Here's how the average MASE ranking plots look like before and after the fix:

After the fix, Chronos-Large achieves the best overall rank (center plot). Chronos-Base obtains the same overall ranking as TimesFM and TimeGPT (right plot).

For the fidelity of the study, we recommend that the authors update their results and discussions accordingly, ideally after an independent verification with the latest code change (see usage below). Thank you again for your effort!

Usage

Download data and setup environment as described here.
Run python eval-chronos.py to re-evaluate (only) Chronos.

Jun 03 '24 18:06 abdulfatir

nixtla nixtla copied to clipboard

fix: Chronos inference in foundation ts arena

Usage

nixtla
nixtla copied to clipboard