Validate checksum after successful import
Use the new 'verify_checksum' hash value in the yaml files to verify the image integrity after it has been successfully imported. Show a warning, if either the hash algorithm or the hash value does not match the expected fields.
- [x] Deactivate broken image / deactivate cycle process
- [x] unit test
The SHA512 values used here are computed by the CI pipeline if and only if the image is updated.
Fixes #340
With the use of the web-download mechanism of Glance images are usually not downloaded by the image-manager but by Glance itself. I'd refrain from adding any code that would download images and requiring enough bandwidth or even disk space to hold them temporarily.
@gndrmnn did you see my comment at https://github.com/osism/openstack-image-manager/issues/340#issuecomment-1471721379 ? in short: Glance does already produce hash values, it's just that this is done with a single fixed hash algo for all images and it cannot be chosen per image.
I suggest to try and extend the import call of the image API (https://docs.openstack.org/api-ref/image/v2/index.html?expanded=import-an-image-detail#import-an-image) to optionally give it a hash algo to use for the checksum. Then you get the image downloaded by Glance and the hash created by Glance and can simply compare it by fetching the image details.
@frittentheke Hi,
With the use of the web-download mechanism of Glance images are usually not downloaded by the image-manager but by Glance itself. I'd refrain from adding any code that would download images and requiring enough bandwidth or even disk space to hold them temporarily.
As far as I understand, the update.py and the corresponding call will only ever be run by a workflow here on GitHub. The change in that file will result in all ymls containing a sha-512 going forward, but they will be "pre-computed" for the average user.
@gndrmnn did you see my comment at #340 (comment) ? in short: Glance does already produce hash values, it's just that this is done with a single fixed hash algo for all images and it cannot be chosen per image.
Yes, I read this. In fact, we compare the (in the future existing) SHA-512 contained in yaml files with the hash that glance computes when importing the image (i think that is how it works?). The imported_image in this call is already the representation of the Image object in openstack, which contains the hash that glance presumably calculated.
I suggest to try and extend the import call of the image API (https://docs.openstack.org/api-ref/image/v2/index.html?expanded=import-an-image-detail#import-an-image) to optionally give it a hash algo to use for the checksum. Then you get the image downloaded by Glance and the hash created by Glance and can simply compare it by fetching the image details.
Yes, that would also be possible. It was discussed with @berendt that we simply recompute the hash values for all image yml files at "image update time". But let's keep the PR on hold until @berendt is back from vacation and gives some input.
@frittentheke This was discussed with @berendt and it was decided that we implement this PR as is. The hash values in the yaml files will be pre-computed by the GitHub/Zuul Pipeline and not on the user side. The main advantage of this method is that we can implement it right now.
However, your suggestion to enhance the upstream API can still be applied later as a separate feature. That would allow Glance to throw away any corrupt image before it even "reaches" the backend. I will open a separate issue for that.
See #643
I'm wondering whether the file should be downloaded AFTER the filename is determined (after the whole HEREBE DRAGONS stuff). Or am I missing something? Also, I would actually keep track of the original hash as well, so that the file need not be downloaded every time the Github action runs. Naturally, I'm volunteering to make the changes I propose, and also to rebase to the outcome of https://github.com/osism/openstack-image-manager/pull/647 :)
I'm wondering whether the file should be downloaded AFTER the filename is determined (after the whole HEREBE DRAGONS stuff). Or am I missing something?
If I understand you correctly, it should make no meaningful difference at which of those two points in time the file is downloaded and hashed.
Also, I would actually keep track of the original hash as well, so that the file need not be downloaded every time the Github action runs
Right, that - or some other mechanism - would make sense to safe some bandwidth. We will consider how to best implement that.
Naturally, I'm volunteering to make the changes I propose, and also to rebase to the outcome of #647 :)
No, it's fine we got this. We might come back to ask some questions about #647 if need be later on though :)
@gndrmnn
If I understand you correctly, it should make no meaningful difference at which of those two points in time the file is downloaded and hashed.
Maybe I got it wrong myself, but in my understanding, the filename is not even known in advance for certain distros.
The latest_url for CentOS Stream 8, for instance, is given as https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-HEREBE\d+\.\dDRAGONS.x86_64.qcow2. When I try do download that, I get a HTTP response code 404. The actual filename has to be determined using the checksums file first to arrive at https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-20230904.0.x86_64.qcow2, which can then be downloaded.
@gndrmnn
If I understand you correctly, it should make no meaningful difference at which of those two points in time the file is downloaded and hashed.
Maybe I got it wrong myself, but in my understanding, the filename is not even known in advance for certain distros.
The
latest_urlfor CentOS Stream 8, for instance, is given as https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-HEREBE\d+.\dDRAGONS.x86_64.qcow2. When I try do download that, I get a HTTP response code 404. The actual filename has to be determined using the checksums file first to arrive at https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-20230904.0.x86_64.qcow2, which can then be downloaded.
Ah right, yes. That indeed needs to be taken care of. Thanks for pointing that out.
@mbuechse I would kindly ask you to review this Merge Request as you are most familiar with the refactored update.py :)
The current iteration of this MR adds an additional field to the yaml schema which contains the sha512 of the actual image. Therefore, we can use the other existing hash to check if a new version was released. Also the filenames are resolved first now. I'd say it makes sense to review the commits individually.
@gndrmnn I've been ill. I will certainly have a look one of the next few days.
Why on earth did the PEP8 issues not trigger the Flake8 test?? :thinking:
Why on earth did the PEP8 issues not trigger the Flake8 test?? 🤔
Maybe your fork somehow didn't include the CI triggers?
Why on earth did the PEP8 issues not trigger the Flake8 test?? 🤔
Maybe your fork somehow didn't include the CI triggers?
Nope, the flake8 test in osism/zuul-jobs did not have the recurse flag enabled for the "Find .py files" job. Which means that in all osism repos only py files located in the repo root are being tested with flake8 :D
Working on a fix fo osism/zuul-jobs now...
Thanks @mbuechse and thanks @gndrmnn for following up to find why this was not caught by CI. With that result this is a very useful finding.
recheck