Remove nbdev for core developments and use pytest for tests
Rationale
- Remove the dependency of
nbdevas part of the development workflow - Introduce test execution using pytest and provide the test coverage of code base
- Note that we do not remove
nbdevcompletely at this stage as it could still be useful to support the generation of notebook-based documentations. - Note that a number of Python packages need to be added because
jupyter nbconvert --to notebook --execute --inplace "$nb"does indeed execute all the notebook cells, this behavior differs from thenbdev_testexecution whereby it is suspected that it only executes selected cells (marked by nbdev directive)- See this job, it failed because
lightgbmpackage was not installed during the notebook run.
- See this job, it failed because
Suggestions
-
run-notebooks-testcan be flaky and currently there is no convenient option likepytest-rerunfailuresto rerun the test when encountered failure likeConnectError: [Errno 104] Connection reset by peer - Nixtla client has retry_strategy and we could consider adding the following as part of the allowable retry scenarios (if rate limiting policy allows delayed retry)
- urllib.error.HTTPError
- httpx.ConnectError
Changes
-
Migrate tests from notebook cells to pytest-styled test cases. The migrated test cases are those mainly registered in.
- nbs/src/date_features.ipynb
- nbs/src/nixtla_client.ipynb
- nbs/src/utils.ipynb
-
Migrated test cases are saved into
nixtla_testsfolder.- Move commonly used test fixtures into
nixtla_tests/conftest.pyfile - Revise the ci workflow and contributing guides to omit the dependency on nbdev.
- As
nbdev_testalso includes execution for notebook innbs/docsfolder such asnbs/docs/tutorials/15_missing_values.ipynbfile, we also need to execute and run this. For notebooks saved in thenbs/docs, we shall replace with another test job=run-notebooks-test
- New set of ci jobs
-
run-notebooks-test: It executes all the .ipynb files found withinnbs/docsfolder on ubuntu platform. -
run-cross-platform-tests: It executes all the pytest cases, both local and distributed runs, on both mac and ubuntu platforms.- Caveat: For
windowsplatform run, we currently disable the test runs for Spark because of error as shown below, which requires a revised system setup (per-my understanding, this requires Nixtla team to setup internally).
- Caveat: For
-
- Move commonly used test fixtures into
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 52.0
Upgraded Packages
-
ray upgraded to 2.20.0
- Required because windows platform could not resolve dependencies during the installation, see job
Added Packages
- pytest
- pytest-cov
- pytest-rerunfailures [This is to support pytest re-running when failure like
httpx.ConnectError,urllib.error.HTTPErrorhappen, avoid rerun whole set of failed job when it failed occasionally]-
httpx.ConnectError: [Errno 104] Connection reset by peer, see failed job -
urllib.error.HTTPError: HTTP Error 429: Too Many Requests, see failed job
-
- shap [required as part of the execution of notebook doc]
- pandas_market_calendars [required as part of execution of notebook doc]
- mlforecast [required as part of the execution of notebook doc]
- lightgbm [required as part of the execution of notebook doc]
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Experiment Results
Experiment 1: air-passengers
Description:
| variable | experiment |
|---|---|
| h | 12 |
| season_length | 12 |
| freq | MS |
| level | None |
| n_windows | 1 |
Results:
| metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive |
|---|---|---|---|---|
| mae | 12.6793 | 11.0623 | 47.8333 | 76 |
| mape | 0.027 | 0.0232 | 0.0999 | 0.1425 |
| mse | 213.936 | 199.132 | 2571.33 | 10604.2 |
| total_time | 14.7789 | 1.1647 | 0.0046 | 0.0037 |
Plot:

Experiment 2: air-passengers
Description:
| variable | experiment |
|---|---|
| h | 24 |
| season_length | 12 |
| freq | MS |
| level | None |
| n_windows | 1 |
Results:
| metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive |
|---|---|---|---|---|
| mae | 58.1031 | 58.4587 | 71.25 | 115.25 |
| mape | 0.1257 | 0.1267 | 0.1552 | 0.2358 |
| mse | 4040.21 | 4110.79 | 5928.17 | 18859.2 |
| total_time | 1.3724 | 0.798 | 0.0038 | 0.0036 |
Plot:

Experiment 3: electricity-multiple-series
Description:
| variable | experiment |
|---|---|
| h | 24 |
| season_length | 24 |
| freq | H |
| level | None |
| n_windows | 1 |
Results:
| metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive |
|---|---|---|---|---|
| mae | 178.293 | 268.121 | 269.23 | 1331.02 |
| mape | 0.0234 | 0.0311 | 0.0304 | 0.1692 |
| mse | 121588 | 219457 | 213677 | 4.68961e+06 |
| total_time | 1.4855 | 1.5092 | 0.0048 | 0.0046 |
Plot:

Experiment 4: electricity-multiple-series
Description:
| variable | experiment |
|---|---|
| h | 168 |
| season_length | 24 |
| freq | H |
| level | None |
| n_windows | 1 |
Results:
| metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive |
|---|---|---|---|---|
| mae | 465.532 | 346.984 | 398.956 | 1119.26 |
| mape | 0.062 | 0.0437 | 0.0512 | 0.1583 |
| mse | 835120 | 403787 | 656723 | 3.17316e+06 |
| total_time | 2.3848 | 0.8562 | 0.0052 | 0.0048 |
Plot:

Experiment 5: electricity-multiple-series
Description:
| variable | experiment |
|---|---|
| h | 336 |
| season_length | 24 |
| freq | H |
| level | None |
| n_windows | 1 |
Results:
| metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive |
|---|---|---|---|---|
| mae | 558.649 | 459.769 | 602.926 | 1340.95 |
| mape | 0.0697 | 0.0566 | 0.0787 | 0.17 |
| mse | 1.22721e+06 | 739135 | 1.61572e+06 | 6.04619e+06 |
| total_time | 1.4571 | 1.6061 | 0.0052 | 0.0048 |
Plot:

@goodwanghan This PR is ready for review. I have included my suggestion and things to take note in the PR. Specifically,
- windows-platform tests disable running on spark related tests (JNI error)
-
run-notebooks-testcould be flaky due toConnectError: [Errno 104] Connection reset by peer
cc: @mergenthaler
@goodwanghan This PR is ready for review. I have included my suggestion and things to take note in the PR. Specifically,
- windows-platform tests disable running on spark related tests (JNI error)
run-notebooks-testcould be flaky due toConnectError: [Errno 104] Connection reset by peercc: @mergenthaler
Amazing work! @JQGoh I will review it tomorrow.
@JQGoh I want to say, this is a masterpiece, and it is worth every engineer in Nixtla to learn!
So I can see there are a few other things that we must change in order to release. For example doc gen is going to break with this change, and it may not be trivial to make that change as an external collaborator.
I think you have accomplished this task, I or some other Nixtla member can continue working on this PR to finish the docs piece.
Great job! I am deeply impressed.
@goodwanghan Thank you.
- Regarding doc gen, I believe specifically you are referring to scripts in
docs-scripts/such asupdate-quarto.py. I made a dummy change to documentation and found that the build-docs job was still successful. Since I did not removenbdevdependency, I suppose the doc gen workflow will still work (assuming the success of build-docs job implies that doc gen is working?) - Before merging this PR, I think the stability still require improvement. What do you think about the suggestion of adding
httpx.ConnectError,urllib.error.HTTPErrorinto the retry strategy of Nixtla client? - It has been a few days since the last successful job run. Today I realize that windows-platform job consistently failed with
Windows fatal exception: access violationand I suspect the failure happened when we executeraypackage related run. This needs further investigation.
**File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\ray\_private\worker.py", line 898 in print_logs**
File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 917 in run
File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 980 in _bootstrap_inner
File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 937 in _bootstrap
- This failure of job also worths the attention as there was an internal error
E nixtla.nixtla_client.ApiError: status_code: 500, body: {'detail': 'Internal server error, please contact us at [email protected]', 'request_id': 'F823W9RYWQ'}
@goodwanghan Thank you.
- Regarding doc gen, I believe specifically you are referring to scripts in
docs-scripts/such asupdate-quarto.py. I made a dummy change to documentation and found that the build-docs job was still successful. Since I did not removenbdevdependency, I suppose the doc gen workflow will still work (assuming the success of build-docs job implies that doc gen is working?)- Before merging this PR, I think the stability still require improvement. What do you think about the suggestion of adding
httpx.ConnectError,urllib.error.HTTPErrorinto the retry strategy of Nixtla client?- It has been a few days since the last successful job run. Today I realize that windows-platform job consistently failed with
Windows fatal exception: access violationand I suspect the failure happened when we executeraypackage related run. This needs further investigation.**File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\ray\_private\worker.py", line 898 in print_logs** File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 917 in run File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 980 in _bootstrap_inner File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 937 in _bootstrap
- This failure of job also worths the attention as there was an internal error
E nixtla.nixtla_client.ApiError: status_code: 500, body: {'detail': 'Internal server error, please contact us at [email protected]', 'request_id': 'F823W9RYWQ'}
Hi @JQGoh my sincere apology for the big delay. I am still travelling and also got sick...
I agree we need to improve the stability of the tests.
Did you mean adding httpx.ConnectError, urllib.error.HTTPError into the retry strategy of unit tests or the NixtlaClient? In practice, we rarely see these errors, so I would suggest we just fix the tests for now.
Regarding the failures of Ray, maybe we can disable the ray tests first to unblock. We can add back ray tests in a separate PR.
I talked with Max, although I am not totally sure if the doc gen really works, we should be able to merge. In case there is any issue, rolling back is really straightforward, and as long as you merge before Friday, it should be safe.
@goodwanghan Thank you.
- Regarding doc gen, I believe specifically you are referring to scripts in
docs-scripts/such asupdate-quarto.py. I made a dummy change to documentation and found that the build-docs job was still successful. Since I did not removenbdevdependency, I suppose the doc gen workflow will still work (assuming the success of build-docs job implies that doc gen is working?)- Before merging this PR, I think the stability still require improvement. What do you think about the suggestion of adding
httpx.ConnectError,urllib.error.HTTPErrorinto the retry strategy of Nixtla client?- It has been a few days since the last successful job run. Today I realize that windows-platform job consistently failed with
Windows fatal exception: access violationand I suspect the failure happened when we executeraypackage related run. This needs further investigation.**File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\ray\_private\worker.py", line 898 in print_logs** File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 917 in run File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 980 in _bootstrap_inner File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\threading.py", line 937 in _bootstrap
- This failure of job also worths the attention as there was an internal error
E nixtla.nixtla_client.ApiError: status_code: 500, body: {'detail': 'Internal server error, please contact us at [email protected]', 'request_id': 'F823W9RYWQ'}Hi @JQGoh my sincere apology for the big delay. I am still travelling and also got sick...
I agree we need to improve the stability of the tests.
Did you mean adding
httpx.ConnectError,urllib.error.HTTPErrorinto the retry strategy of unit tests or the NixtlaClient? In practice, we rarely see these errors, so I would suggest we just fix the tests for now.Regarding the failures of Ray, maybe we can disable the ray tests first to unblock. We can add back ray tests in a separate PR.
I talked with Max, although I am not totally sure if the doc gen really works, we should be able to merge. In case there is any issue, rolling back is really straightforward, and as long as you merge before Friday, it should be safe.
@goodwanghan Let me summarize the latest changes with this PR since your last review
-
httpx.ConnectError,urllib.error.HTTPErrorinto the retry strategy
- I meant to add these into the retry strategy of
NixtlaClient, not the unit tests (pytest-rerunfailures plugin can cover unit tests). - The main motivation was that run-notebook-test failures can not be rerun with the help of
pytest-rerunfailures plugin - Nevertheless, my guess is that previously I have
run-cross-platform-testsandrun-notebook-testjobs executed concurrently, this likely to result inConnectErrorand etc. Now we revise this such that run-notebook-test will only be triggered after the completion of pytest. - We could also save the cost of executing notebook tests if there is already a breaking change in unittests.
-
I disabled all the distributed related runs such as dask, ray, spark framework related tests for
Windows-platform. Despite my previous attempt of disabling ray and spark tests, I could not get theWindows-platformtests passed.. Leave this for future work. -
Discovered the test_dask_finetune_model test run can be flaky, failed with Assertion check. Mark this test as flaky explicitly such that it can be automatically rerun
Lastly, need your help to merge this PR, because the expected but outdated run-local-tests related checks need to be passed, before I can merge this PR.