pydicom icon indicating copy to clipboard operation
pydicom copied to clipboard

`TypeError: dcmread: Expected a file ... but got NoneType` test failures on `=pydicom-2.3.1`

Open TheChymera opened this issue 2 years ago • 15 comments

Full build log: https://ppb.chymera.eu/084827.log

Any ideas what's going on?

TheChymera avatar May 25 '23 05:05 TheChymera

At a first glance it looks like the test data cannot be accessed - maybe this warning:

_Warning: Package 'pydicom.data.test_files' is absent from the `packages` configuration.

has something to do with it. Are you packaging a distribution?

mrbean-bremen avatar May 25 '23 05:05 mrbean-bremen

@mrbean-bremen yes, I'm packaging it for Gentoo Linux. Is there perhaps a static testdata archive somewhere? The log above is using the PyPI tarball, I also tried it with the GH one, but I get the same errors.

TheChymera avatar May 25 '23 05:05 TheChymera

Yes, there is both testdata in the package (the one mentioned in the warning), and a static test data archive in a separate repository (pydicom/pydicom-data). I thought that the tests that use the external data are excluded if it cannot be accessed, but I may be wrong. I may have a closer look tonight (at work right now), or @darcymason may check this if he finds the time.

mrbean-bremen avatar May 25 '23 06:05 mrbean-bremen

Instead of trying to completely understand what is going on here, I have been wondering anyway about being more explicit for optional packages in pydicom's new pyproject.toml, which might help solve this.

Specifically, I'm thinking to options so one could pip install pydicom[test_files], or pip install pydicom[all], etc. although I'm not clear what the script does here. If it starts with the source tarball, then maybe that is not possible. Alternatively, can your packaging be done without running pydicom's tests?

I'm hoping to do a pydicom 2.4 release in the next week or so. If you could wait for that, we could ensure that it is configured correctly to work for your build.

I thought that the tests that use the external data are excluded if it cannot be accessed, but I may be wrong.

I'll try to have a look at that too later today if @mrbean-bremen doesn't get to it first

darcymason avatar May 25 '23 13:05 darcymason

@darcymason yes, I can temporarily disable the tests. Can you ping me on this with the new release?

TheChymera avatar May 25 '23 16:05 TheChymera

I thought that the tests that use the external data are excluded if it cannot be accessed, but I may be wrong.

I've had a look at this and there is are classes of tests of the data manager code itself that skip if EXT_PYDICOM (whether pydicom-data installed) is True and another class if it is False, but no such check in the other tests AFAICT. I haven't followed it through completely, but I believe the files will just be downloaded as needed by each test.

darcymason avatar May 25 '23 17:05 darcymason

Can you ping me on this with the new release?

@TheChymera, sure, we will likely enlist your help pre-release, to make sure it is okay before publishing.

darcymason avatar May 25 '23 17:05 darcymason

Thanks @darcymason - I got distracted with other not pydicom-related stuff and didn't get to it.

I've had a look at this and there is are classes of tests of the data manager code itself that skip if EXT_PYDICOM (whether pydicom-data installed) is True and another class if it is False, but no such check in the other tests AFAICT.

Yes, you are right. It is probably the best to also skip all tests relying on this data, maybe using an auto-use fixture that detects if the files are available. Speaking of fixtures - I also noticed a whole lot of pytest warnings about the obsolete setup method. We may also want to fix this. I'm not sure if I find the time before the release, but these should be easy things to do.

mrbean-bremen avatar May 25 '23 19:05 mrbean-bremen

@darcymason - I've started to try this out locally. Depending on an environment variable, get_test_data will skip the test if the data is not available and external data shall not be fetched (probably making this the default, so that for packaging it will never be attempted). Unfortunately, there are a lot of get_test_data used globally in the test modules, which will not work for skipping single tests, so this will take a bit to change. Nevertheless, I think it is better to do this in a general way, so we don't have to mark each single test that uses external data. What do you think?

I will try to put something together over the weekend.

mrbean-bremen avatar May 27 '23 07:05 mrbean-bremen

What do you think?

I will try to put something together over the weekend.

Sure, if you are willing to come up with something, that would be great.

skip the test if the data is not available and external data shall not be fetched (probably making this the default, so that for packaging it will never be attempted).

I'm not sure how this should work -- it seems making the default skip a large number of tests might be awkward. How do we set that for our CI, etc.? Does that mean adding an environment variable change to all the workflows? How does that work for a user who just installed and wants to run the tests? -- in partial answer to my own question, I think we could have a [dev] extras option in the install, or [test], and we could add that to the contributing guidelines (and/or installation documentation)

darcymason avatar May 27 '23 14:05 darcymason

Unfortunately, there are a lot of get_test_data used globally in the test modules,

Forget to mention this in my previous comment. I did wonder whether we should simplify that anyway. I don't see the point of setting a global variable, e.g. rtplan_name = get_testdata_file("rtplan.dcm") when each test using it could just call the get_testdata_file directly. I doubt there is a significant time cost in getting the full filepath again for files that are used in multiple tests, and if even if there is, some kind of caching could be done to avoid that penalty.

darcymason avatar May 27 '23 15:05 darcymason

So, another thought...what if the tests simply required pydicom-data to be installed and no tests are run if it is not, or the user has to specify it is okay to install pydicom-data. I think all or almost all files are used in the tests, so it would be more efficient anyway to just get them all at once.

I'm not sure if that can work in this issue's build from a source tarball scenario ... but if there were a way to do something parallel to pip install[tests] or pip install[all] kind of behavior (maybe produce two different source builds - one with pydicom-data, one without), then perhaps it could work out.

darcymason avatar May 27 '23 18:05 darcymason

I'm not sure how this should work -- it seems making the default skip a large number of tests might be awkward.

Agreed.

I doubt there is a significant time cost in getting the full filepath again for files that are used in multiple tests, and if even if there is, some kind of caching could be done to avoid that penalty.

Yes, I ended up doing exactly this for many cases (specifically the pixel handler tests, as they use the names to parametrize the tests). I started by adding fixtures for the common cases, but I'm now not so sure if this is even needed.

So, another thought...what if the tests simply required pydicom-data to be installed and no tests are run if it is not, or the user has to specify it is okay to install pydicom-data.

You mean no tests at all are run? This would make some sense, as with my current implementation, only the tests with external data are skipped, which is a rather arbirary collection of tests. I'm inclined to throw away my local changes in favour of an easier way that will just skip all tests if pydicom-data is not available. I'm not sure about the best way yet...

mrbean-bremen avatar May 27 '23 19:05 mrbean-bremen

You mean no tests at all are run?

Yes, that is the suggestion. Throw an error message stating that testing requires pydicom-data to be installed. But as I said, I'm not sure how that can work in a source distribution as in this issue.

darcymason avatar May 27 '23 20:05 darcymason

@TheChymera, I'm attaching a source tarball from my latest attempts with the prep2.4 branch. Do you mind giving it a try and making sure it works in your environment? It should be the same as a normal sdist, just without the tests folder.

pydicom-2.4.0.tar.gz

darcymason avatar Jun 02 '23 20:06 darcymason