sktime icon indicating copy to clipboard operation
sktime copied to clipboard

[ENH] Add support for TimesFM

Open benHeid opened this issue 9 months ago • 1 comments

Support for TimesFM, the time series foundation model from google research:

  • https://github.com/google-research/timesfm?tab=readme-ov-file

benHeid avatar May 11 '24 08:05 benHeid

added it to the list here: https://github.com/sktime/sktime/issues/6177

One of these days, we should perhaps create our own?

fkiraly avatar May 11 '24 09:05 fkiraly

I have a question: What is the right way to implement this interface?

  1. add the package timesfm as a dependency to sktime and create the interface on top of that?
  2. repeat the model implementation using the same libraries and code that is in the actual source? - that will put jax and related libraries as new dependencies in sktime
  3. convert the model code from jax to pytorch and build an interface on top of the pytorch adapter in sktime?

geetu040 avatar May 24 '24 19:05 geetu040

I would prefer option 1 if possible. If this is not possible we can discuss in more detailed how to proceed

benHeid avatar May 24 '24 20:05 benHeid

@fkiraly @geetu040 @benHeid Just pinging to confirm whether or not this item is picked up yet.

I would like to pick this issue up and create an interface for TimesFM

julian-fong avatar Jun 01 '24 14:06 julian-fong

@julian-fong did you start working on this? I was also planning to do so. If you are busy with something else as well, I can take this up.

geetu040 avatar Jun 04 '24 19:06 geetu040

@fkiraly There are a few things with TimesFM

  • it also uses freq, should we deal with it the same way we dealt with NeuralForecast in https://github.com/sktime/sktime/pull/6039
  • it has no training interface - it can only do zero-shot forecasting on a pre-trained model, should we keep it that way?
  • user can provide device as an argument like "cpu", "gpu" or "tpu" - should we give this option as a parameter to this interface as well?

geetu040 avatar Jun 05 '24 18:06 geetu040

it also uses freq, should we deal with it the same way we dealt with NeuralForecast

You mean, as mandatory arg? Yes, that might be a good idea - if it is the same logic, it might be worth moving it to a common or an adapter module.

it has no training interface - it can only do zero-shot forecasting on a pre-trained model, should we keep it that way?

No context or fine-tuning? That is odd for an FM, but if it is so, then fit is simply empty.

user can provide device as an argument like "cpu", "gpu" or "tpu" - should we give this option as a parameter to this interface as well?

Why not?

fkiraly avatar Jun 05 '24 21:06 fkiraly

Why not?

I have tried this on linux and things are working fine - but I have seen people raise issues and also discussed this with @julian-fong that there can be possible failures on windows and mac, especially when device is set to "gpu"

it might be worth moving it to a common or an adapter module.

sure, I'll look into that

geetu040 avatar Jun 06 '24 04:06 geetu040

@fkiraly this might be blocking - TimesFM does not work for all python versions, how do we handle that?

geetu040 avatar Jun 06 '24 09:06 geetu040

From my understanding - the package timesfm does not work on windows and mac because the required package lingvo is not available. Threads detailing the errors can be found here https://github.com/google-research/timesfm/issues/1 and https://github.com/google-research/timesfm/issues/24

julian-fong avatar Jun 06 '24 12:06 julian-fong

It works on colab but when I try to debug locally (ubuntu) I run from one error to another


On installation, gives this error if python>=3.11

ERROR: Could not find a version that satisfies the requirement lingvo==0.12.7 (from paxml) (from versions: none)
ERROR: No matching distribution found for lingvo==0.12.7

On python<=3.10 when you try to debug or try creating completely new conda env from .yml file and running the code gives these different errors

1

 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

2

2024-06-06 17:16:08.597909: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Segmentation fault (core dumped)

3

  File "/home/geetu/miniconda3/envs/310/lib/python3.10/asyncio/locks.py", line 234, in __init__
    raise ValueError("loop argument must agree with lock")
ValueError: loop argument must agree with lock

geetu040 avatar Jun 06 '24 15:06 geetu040

I start wondering whether anyone anywhere has succeeded in getting this to run?

fkiraly avatar Jun 06 '24 18:06 fkiraly

well it works with no error, no warning on google colab - I have tried to mimic the library versions of jax, jaxlib and tf from colab, all in vain. It seems to really depend on the hardware other than just libraries

geetu040 avatar Jun 06 '24 19:06 geetu040

maybe that's just google's way to try getting everyone to use colab 😁

fkiraly avatar Jun 06 '24 20:06 fkiraly

Extending the above comment https://github.com/sktime/sktime/issues/6408#issuecomment-2152798942

Once the library is installed on python<=3.10 the stated errors are raised on this simple code snippet from the official documentation.

import timesfm

tfm = timesfm.TimesFm(
    context_len=<context>,
    horizon_len=<horizon>,
    input_patch_len=32,
    output_patch_len=128,
    num_layers=20,
    model_dims=1280,
    backend=<backend>,
)
tfm.load_from_checkpoint(repo_id="google/timesfm-1.0-200m")

geetu040 avatar Jun 07 '24 17:06 geetu040

Lnks to more extensive summary of problems: https://github.com/sktime/sktime/pull/6571#issuecomment-2158597387

If I summarize correctly, then:

  • does not work on python 3.11 or above -> can deal with this by a python_version tag setting
  • does not run on local system due to mysterious error -> currently no idea how to debug this?

fkiraly avatar Jun 11 '24 14:06 fkiraly

Problems with the implementation

Here I have summarized the problems I faced implementing this algorithm in https://github.com/sktime/sktime/pull/6571. This draft PR is currently blocked and is not being worked on. If someone wants to continue on this, this summary might be helpful on the existing problems.

Official Package

There exists no official package for timesfm on pypi. It is downloaded from the official github source git+https://github.com/google-research/timesfm.git. Although there exists a package on pypi by the name of timesfm which is a part of a pull request on the official code that is yet to be merged, therefore it is advised to not download from there.

Installing Library

The dependencies used by timesfm are very strict to the python environment

On installation, gives this error if python>=3.11

ERROR: Could not find a version that satisfies the requirement lingvo==0.12.7 (from paxml) (from versions: none)
ERROR: No matching distribution found for lingvo==0.12.7

Therefore you need to have python<3.11 to install timesfm from git+https://github.com/google-research/timesfm.git

Hardware Errors

Even if you set the particular environment and install the library successfully you will run into prolix errors

I am running this code from the official documentation on Ubuntu 24.04 LTS with python==3.10.0

import timesfm

tfm = timesfm.TimesFm(
    context_len=<context>,
    horizon_len=<horizon>,
    input_patch_len=32,
    output_patch_len=128,
    num_layers=20,
    model_dims=1280,
    backend=<backend>,
)
tfm.load_from_checkpoint(repo_id="google/timesfm-1.0-200m")

Resulting with this error primarily

  File "/home/geetu/miniconda3/envs/310/lib/python3.10/asyncio/locks.py", line 234, in __init__
    raise ValueError("loop argument must agree with lock")
ValueError: loop argument must agree with lock

If I try to fix and debug around this error, I am stuck with more errors mentioned below 1

 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

2

2024-06-06 17:16:08.597909: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Segmentation fault (core dumped)

Google Colab

Although the code works perfectly fine on google colab with no library installation errors or runtime errors. Here is the code on google colab

geetu040 avatar Jul 05 '24 16:07 geetu040

The problem is Lingvo which is a dependency of paxml, which is used to load the weights of the pretrained model... So this is definitely a blocker. At least they are saying that they are working on a solution on GitHub:

The dependency lingvo does not support ARM architectures, and the code is not working for machines with Apple silicon. We are aware of this issue and are working on a solution. Stay tuned.

Source: https://github.com/google-research/timesfm and a related issue in their repo: https://github.com/google-research/timesfm/issues/74#issuecomment-2201180071

benHeid avatar Jul 07 '24 12:07 benHeid

I am affected by this same issue.

I had it working last week, using the CPU on an Ubuntu Python image (Debian, without Conda, Python 3.10).

It was when I ran this on an Alpine Python image (everything else the same) I encountered this issue.

Thanks for everyone's efforts so far, I will keep an eye on the fixes progress.

xmpro-dk avatar Jul 09 '24 22:07 xmpro-dk

@geetu040 have you tried to install it directly via pypi as described here: https://github.com/google-research/timesfm?tab=readme-ov-file#installation

It seems that there is meanwhile an official pypi package.

benHeid avatar Jul 15 '24 18:07 benHeid

Looks like that update was made recently - will see if I can get a local copy running on my windows pc

julian-fong avatar Jul 15 '24 19:07 julian-fong

I am affected by this same issue.

I had it working last week, using the CPU on an Ubuntu Python image (Debian, without Conda, Python 3.10).

It was when I ran this on an Alpine Python image (everything else the same) I encountered this issue.

Thanks for everyone's efforts so far, I will keep an eye on the fixes progress.

I now have it running on the python:3.12-bookworm docker image, installing the pip packages directly, not targeting versions, letting pip resolve the versions it needs. For me it was an issue trying to run this on Alpine.

xmpro-dk avatar Jul 15 '24 22:07 xmpro-dk

I am running this on python: 3.10.12 and mint OS: 21.3 Cinnamon and the previously failing code now works. I will inspect this in more detail.

geetu040 avatar Jul 16 '24 06:07 geetu040

It looks like versions are pinned or have narrow ranges, and are conflicting with basically any other package?

fkiraly avatar Jul 18 '24 08:07 fkiraly

It looks like versions are pinned or have narrow ranges, and are conflicting with basically any other package?

exactly

geetu040 avatar Jul 18 '24 08:07 geetu040

I would file this under "no working pypi release", since while there is a pypi release, it is not "working" for all practical purposes. My proposed next step would to proceed with vendoring, with the intention for this to be temporary untli the owners make a properly useable pypi release (I estimate this to be in the order of months).

fkiraly avatar Jul 18 '24 15:07 fkiraly