transformers Test summary with previous PyTorch/TensorFlow versions

Initialized by @LysandreJik, we ran the tests with previous PyTorch/TensorFlow versions. The goal is to determine if we should drop (some) earlier PyTorch/TensorFlow versions.

This is not exactly the same as the scheduled daily CI (torch-scatter, accelerate not installed, etc.)
Currently we only have the global summary (i.e. there is no number of test failures per model)

Here is the results (running on ~June 20, 2022):

PyTorch testing has ~27100 tests
TensorFlow testing has ~15700 tests

Framework	No. Failures
PyTorch 1.10	50
PyTorch 1.9	710
PyTorch 1.8	1301
PyTorch 1.7	1567
PyTorch 1.6	2342
PyTorch 1.5	3315
PyTorch 1.4	3949
TensorFlow 2.8	118
TensorFlow 2.7	122
TensorFlow 2.6	122
TensorFlow 2.5	128
TensorFlow 2.4	167

It looks like the number of failures in TensorFlow testing doesn't increase much.

So far my thoughts:

All TF >= 2.4 should be (still) kept in the list of supported versions

Questions

What's you opinion regarding which versions to drop support?
Would you like to see the number of test failures per model?
TensorFlow 2.3 needs CUDA 10.1 and requires the build of a special docker image. Do you think we should make the effort on it to have the results for TF 2.3?

Jul 18 '22 12:07 ydshieh

cc @LysandreJik @sgugger @patrickvonplaten @Rocketknight1 @gante @anton-l @NielsRogge @amyeroberts @alaradirik @stas00 @hollance to have your comments

Jul 18 '22 13:07 ydshieh

TF 2.3 is quite old by now, and I wouldn't make a special effort to support it. Several nice TF features (like the Numpy-like API) only arrived in TF 2.4, and we're likely to use those a lot in future.

Jul 18 '22 13:07 Rocketknight1

Hey @ydshieh, would you have a summary of the failing tests handy? I'm curious to see the reason why there are so many failures for PyTorch as soon as we leave the latest version. I'm quite confident that it's an issue in our tests rather than in our internal code, so seeing the failures would help. Thanks!

Jul 19 '22 13:07 LysandreJik

@LysandreJik I will re-run it. The previous run(s) have huge tables in the reports, and sending to Slack failed (3001 character limit). I finally ran it by disabling those blocks.

Before re-running it, I need a approve for #17921

Jul 19 '22 13:07 ydshieh

I ran the past CI again which returns more information. Looking the report for PyTorch 1.4 quickly, here are some observations:

There is one error occurring in almost all models:

from_pretrained: OSError: Unable to load weights from pytorch checkpoint file for`
- torch.load: Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old.

Another one also occurs a lot (torchscript tests)

(line 625) AttributeError: module 'torch.jit' has no attribute '_state'

An error occurs (specifically) to vision models (probably due to the convolution layers)

(line 97) RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

BART has 108/106 failures:

(line 240) RuntimeError: CUDA error: device-side assert triggered
- Don't know what's wrong here yet

Others

Other AttributeError: (not exhaustive)
- AttributeError: module 'torch' has no attribute 'minimum'
- AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'
- AttributeError: module 'torch' has no attribute 'square'
- AttributeError: module 'torch.nn' has no attribute 'Hardswish'
- AttributeError: module 'torch' has no attribute 'logical_and'
- AttributeError: module 'torch' has no attribute 'pi'
- AttributeError: module 'torch' has no attribute 'multiply'

Aug 01 '22 13:08 ydshieh

Thanks for the report! Taking a look at the PyTorch versions, here are the dates at which they were releases:

Most of the errors in from_pretrained seem to come from the zipfile format introduced by PyTorch 1.6. I think this is the most annoying one to patch by far.

From a first look, I'd offer to drop support for all PyTorch version inferior to < 1.6 as these have been released more than two years ago.

Do you have a link to a job containing all these failures? I'd be interested in seeing if the 2342 errors in PyTorch 1.6 are solvable simply or if they will require a significant refactor.

Aug 09 '22 10:08 LysandreJik

The link is here. But since it contains too many jobs (all models x all versions ~= 3200 jobs), it just shows [Unicorn!] This page is taking too long to load.

I can re-run specifically for PyTorch 1.6 only, and will post a link later.

Aug 09 '22 11:08 ydshieh

From a first look, I'd offer to drop support for all PyTorch version inferior to < 1.6 as these have been released more than two years ago.

I second that.

While we are at it, do we want to establish an official shifting window of how far back we want to support pytorch versions for? As in minimum - we support at least 2 years of pytorch? If it's easy to support longer we would but it'd be easy to cut off if need be.

The user always has the older transformers that they can pin to if they really need a very old pytorch support.

Aug 09 '22 17:08 stas00

Yes, that would work fine with me. If I understand correctly, that's how libraries in the PyData ecosystem (scikit-learn, numpy) manage the support of Python versions: they drop support for versions older than 2 years (https://github.com/scikit-learn/scikit-learn/issues/20965, https://github.com/scikit-learn/scikit-learn/issues/20084, scipy toolchaib, https://github.com/scipy/scipy/pull/14655).

Dropping support for PyTorch/Flax/TensorFlow versions that have been released more than two years ago sounds good to me. That is somewhat already the case (see failing tests), but we're just not aware.

Aug 10 '22 07:08 LysandreJik

Hi, I am wondering what it means a PyTorch/TensorFlow/Flax version is supported. I guess it doesn't imply all models work under those framework versions, but would like to know if there is more explicit definition (for transformers, or more generally, in open source projects).

Aug 10 '22 07:08 ydshieh

Ideally it should mean that all models work/all tests pass apart from functionality explicitly having versions tests (like CUDA bfloat16 or torch FX where we test against a specific PyTorch version).

Aug 10 '22 12:08 sgugger

transformers transformers copied to clipboard

Test summary with previous PyTorch/TensorFlow versions

So far my thoughts:

Questions

transformers
transformers copied to clipboard