nixtla icon indicating copy to clipboard operation
nixtla copied to clipboard

Remove nbdev for core developments and use pytest for tests

Open JQGoh opened this issue 7 months ago • 6 comments

Rationale

  • Remove the dependency of nbdev as part of the development workflow
  • Introduce test execution using pytest and provide the test coverage of code base
  • Note that we do not remove nbdev completely at this stage as it could still be useful to support the generation of notebook-based documentations.
  • Note that a number of Python packages need to be added because jupyter nbconvert --to notebook --execute --inplace "$nb" does indeed execute all the notebook cells, this behavior differs from the nbdev_test execution whereby it is suspected that it only executes selected cells (marked by nbdev directive)
    • See this job, it failed because lightgbm package was not installed during the notebook run.

Suggestions

  • run-notebooks-test can be flaky and currently there is no convenient option like pytest-rerunfailures to rerun the test when encountered failure like ConnectError: [Errno 104] Connection reset by peer
  • Nixtla client has retry_strategy and we could consider adding the following as part of the allowable retry scenarios (if rate limiting policy allows delayed retry)
    • urllib.error.HTTPError
    • httpx.ConnectError

Changes

  • Migrate tests from notebook cells to pytest-styled test cases. The migrated test cases are those mainly registered in.

    • nbs/src/date_features.ipynb
    • nbs/src/nixtla_client.ipynb
    • nbs/src/utils.ipynb
  • Migrated test cases are saved into nixtla_tests folder.

    • Move commonly used test fixtures into nixtla_tests/conftest.py file
    • Revise the ci workflow and contributing guides to omit the dependency on nbdev.
    • As nbdev_test also includes execution for notebook in nbs/docs folder such as nbs/docs/tutorials/15_missing_values.ipynb file, we also need to execute and run this. For notebooks saved in the nbs/docs, we shall replace with another test job=run-notebooks-test
    • New set of ci jobs
      • run-notebooks-test: It executes all the .ipynb files found within nbs/docs folder on ubuntu platform.
      • run-cross-platform-tests: It executes all the pytest cases, both local and distributed runs, on both mac and ubuntu platforms.
        • Caveat: For windows platform run, we currently disable the test runs for Spark because of error as shown below, which requires a revised system setup (per-my understanding, this requires Nixtla team to setup internally).
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 52.0      

Upgraded Packages

  • ray upgraded to 2.20.0
    • Required because windows platform could not resolve dependencies during the installation, see job

Added Packages

  • pytest
  • pytest-cov
  • pytest-rerunfailures [This is to support pytest re-running when failure like httpx.ConnectError, urllib.error.HTTPError happen, avoid rerun whole set of failed job when it failed occasionally]
    • httpx.ConnectError: [Errno 104] Connection reset by peer, see failed job
    • urllib.error.HTTPError: HTTP Error 429: Too Many Requests, see failed job
  • shap [required as part of the execution of notebook doc]
  • pandas_market_calendars [required as part of execution of notebook doc]
  • mlforecast [required as part of the execution of notebook doc]
  • lightgbm [required as part of the execution of notebook doc]

JQGoh avatar Jun 04 '25 14:06 JQGoh

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Experiment Results

Experiment 1: air-passengers

Description:

variable experiment
h 12
season_length 12
freq MS
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 12.6793 11.0623 47.8333 76
mape 0.027 0.0232 0.0999 0.1425
mse 213.936 199.132 2571.33 10604.2
total_time 14.7789 1.1647 0.0046 0.0037

Plot:

Experiment 2: air-passengers

Description:

variable experiment
h 24
season_length 12
freq MS
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 58.1031 58.4587 71.25 115.25
mape 0.1257 0.1267 0.1552 0.2358
mse 4040.21 4110.79 5928.17 18859.2
total_time 1.3724 0.798 0.0038 0.0036

Plot:

Experiment 3: electricity-multiple-series

Description:

variable experiment
h 24
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 178.293 268.121 269.23 1331.02
mape 0.0234 0.0311 0.0304 0.1692
mse 121588 219457 213677 4.68961e+06
total_time 1.4855 1.5092 0.0048 0.0046

Plot:

Experiment 4: electricity-multiple-series

Description:

variable experiment
h 168
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 465.532 346.984 398.956 1119.26
mape 0.062 0.0437 0.0512 0.1583
mse 835120 403787 656723 3.17316e+06
total_time 2.3848 0.8562 0.0052 0.0048

Plot:

Experiment 5: electricity-multiple-series

Description:

variable experiment
h 336
season_length 24
freq H
level None
n_windows 1

Results:

metric timegpt-1 timegpt-1-long-horizon SeasonalNaive Naive
mae 558.649 459.769 602.926 1340.95
mape 0.0697 0.0566 0.0787 0.17
mse 1.22721e+06 739135 1.61572e+06 6.04619e+06
total_time 1.4571 1.6061 0.0052 0.0048

Plot:

github-actions[bot] avatar Jun 04 '25 14:06 github-actions[bot]

@goodwanghan This PR is ready for review. I have included my suggestion and things to take note in the PR. Specifically,

  • windows-platform tests disable running on spark related tests (JNI error)
  • run-notebooks-test could be flaky due to ConnectError: [Errno 104] Connection reset by peer

cc: @mergenthaler

JQGoh avatar Jun 09 '25 00:06 JQGoh

@goodwanghan This PR is ready for review. I have included my suggestion and things to take note in the PR. Specifically,

  • windows-platform tests disable running on spark related tests (JNI error)
  • run-notebooks-test could be flaky due to ConnectError: [Errno 104] Connection reset by peer

cc: @mergenthaler

Amazing work! @JQGoh I will review it tomorrow.

goodwanghan avatar Jun 09 '25 06:06 goodwanghan

@JQGoh I want to say, this is a masterpiece, and it is worth every engineer in Nixtla to learn!

So I can see there are a few other things that we must change in order to release. For example doc gen is going to break with this change, and it may not be trivial to make that change as an external collaborator.

I think you have accomplished this task, I or some other Nixtla member can continue working on this PR to finish the docs piece.

Great job! I am deeply impressed.

goodwanghan avatar Jun 11 '25 06:06 goodwanghan

@goodwanghan Thank you.

  • Regarding doc gen, I believe specifically you are referring to scripts in docs-scripts/ such as update-quarto.py. I made a dummy change to documentation and found that the build-docs job was still successful. Since I did not remove nbdev dependency, I suppose the doc gen workflow will still work (assuming the success of build-docs job implies that doc gen is working?)
  • Before merging this PR, I think the stability still require improvement. What do you think about the suggestion of adding httpx.ConnectError, urllib.error.HTTPError into the retry strategy of Nixtla client?
  • It has been a few days since the last successful job run. Today I realize that windows-platform job consistently failed with Windows fatal exception: access violation and I suspect the failure happened when we execute ray package related run. This needs further investigation.
  **File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\ray\_private\worker.py", line 898 in print_logs**
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 917 in run
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 937 in _bootstrap
  • This failure of job also worths the attention as there was an internal error
E           nixtla.nixtla_client.ApiError: status_code: 500, body: {'detail': 'Internal server error, please contact us at [email protected]', 'request_id': 'F823W9RYWQ'}

JQGoh avatar Jun 17 '25 22:06 JQGoh

@goodwanghan Thank you.

  • Regarding doc gen, I believe specifically you are referring to scripts in docs-scripts/ such as update-quarto.py. I made a dummy change to documentation and found that the build-docs job was still successful. Since I did not remove nbdev dependency, I suppose the doc gen workflow will still work (assuming the success of build-docs job implies that doc gen is working?)
  • Before merging this PR, I think the stability still require improvement. What do you think about the suggestion of adding httpx.ConnectError, urllib.error.HTTPError into the retry strategy of Nixtla client?
  • It has been a few days since the last successful job run. Today I realize that windows-platform job consistently failed with Windows fatal exception: access violation and I suspect the failure happened when we execute ray package related run. This needs further investigation.
  **File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\ray\_private\worker.py", line 898 in print_logs**
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 917 in run
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 937 in _bootstrap
  • This failure of job also worths the attention as there was an internal error
E           nixtla.nixtla_client.ApiError: status_code: 500, body: {'detail': 'Internal server error, please contact us at [email protected]', 'request_id': 'F823W9RYWQ'}

Hi @JQGoh my sincere apology for the big delay. I am still travelling and also got sick...

I agree we need to improve the stability of the tests.

Did you mean adding httpx.ConnectError, urllib.error.HTTPError into the retry strategy of unit tests or the NixtlaClient? In practice, we rarely see these errors, so I would suggest we just fix the tests for now.

Regarding the failures of Ray, maybe we can disable the ray tests first to unblock. We can add back ray tests in a separate PR.

I talked with Max, although I am not totally sure if the doc gen really works, we should be able to merge. In case there is any issue, rolling back is really straightforward, and as long as you merge before Friday, it should be safe.

goodwanghan avatar Jul 08 '25 15:07 goodwanghan

@goodwanghan Thank you.

  • Regarding doc gen, I believe specifically you are referring to scripts in docs-scripts/ such as update-quarto.py. I made a dummy change to documentation and found that the build-docs job was still successful. Since I did not remove nbdev dependency, I suppose the doc gen workflow will still work (assuming the success of build-docs job implies that doc gen is working?)
  • Before merging this PR, I think the stability still require improvement. What do you think about the suggestion of adding httpx.ConnectError, urllib.error.HTTPError into the retry strategy of Nixtla client?
  • It has been a few days since the last successful job run. Today I realize that windows-platform job consistently failed with Windows fatal exception: access violation and I suspect the failure happened when we execute ray package related run. This needs further investigation.
  **File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\ray\_private\worker.py", line 898 in print_logs**
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 917 in run
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 937 in _bootstrap
  • This failure of job also worths the attention as there was an internal error
E           nixtla.nixtla_client.ApiError: status_code: 500, body: {'detail': 'Internal server error, please contact us at [email protected]', 'request_id': 'F823W9RYWQ'}

Hi @JQGoh my sincere apology for the big delay. I am still travelling and also got sick...

I agree we need to improve the stability of the tests.

Did you mean adding httpx.ConnectError, urllib.error.HTTPError into the retry strategy of unit tests or the NixtlaClient? In practice, we rarely see these errors, so I would suggest we just fix the tests for now.

Regarding the failures of Ray, maybe we can disable the ray tests first to unblock. We can add back ray tests in a separate PR.

I talked with Max, although I am not totally sure if the doc gen really works, we should be able to merge. In case there is any issue, rolling back is really straightforward, and as long as you merge before Friday, it should be safe.

@goodwanghan Let me summarize the latest changes with this PR since your last review

  1. httpx.ConnectError, urllib.error.HTTPError into the retry strategy

  • I meant to add these into the retry strategy of NixtlaClient, not the unit tests (pytest-rerunfailures plugin can cover unit tests).
  • The main motivation was that run-notebook-test failures can not be rerun with the help of pytest-rerunfailures plugin
  • Nevertheless, my guess is that previously I have run-cross-platform-tests and run-notebook-test jobs executed concurrently, this likely to result in ConnectError and etc. Now we revise this such that run-notebook-test will only be triggered after the completion of pytest.
  • We could also save the cost of executing notebook tests if there is already a breaking change in unittests.
  1. I disabled all the distributed related runs such as dask, ray, spark framework related tests for Windows-platform. Despite my previous attempt of disabling ray and spark tests, I could not get the Windows-platform tests passed.. Leave this for future work.

  2. Discovered the test_dask_finetune_model test run can be flaky, failed with Assertion check. Mark this test as flaky explicitly such that it can be automatically rerun

Lastly, need your help to merge this PR, because the expected but outdated run-local-tests related checks need to be passed, before I can merge this PR. image

JQGoh avatar Jul 10 '25 15:07 JQGoh