Add checksum verification for "file:" protocol in Get-ChocolateyWebFile
Checklist
- [X] I have verified this is the correct repository for opening this issue.
- [X] I have verified no other issues exist related to my request.
Is Your Feature Request Related To A Problem? Please describe.
Installers stored on local file systems or shares typically will copy quite fast so it's not much of an issue to run them directly from their current location like what Install-ChocolateyInstallPackage does. But increasingly people are working over VPNs or may have slow network connections to "local" shares (hosted on OneDrive for example). It would be desirable for Chocolatey to check if a previously downloaded installer is the same as the one it is trying to run and use the one it has already instead of downloading it again.
Describe The Solution. Why is it needed?
The Install-ChocolateyPackage function calls the Get-ChocolateyWebFile function to download the file, When it downloads http/s protocols, it checks the checksum before it downloads to save time if it has already been downloaded .
However, when it downloads from a file: protocol, it does not do this.
I propose that the checksum checking before download is also added to the file:/// protocol so that the benefit gained when it is used over the web is also gained for "local" files.
Additional Context
This request is distinct from simply running a hash check on the file once it's downloaded, this can be done using Install-ChocolateyPackage instead of Install-ChocolateyInstallPackage and using the file:/// prefix in the url parameters. I want it to check the hash before the file is downloaded and at the moment this only occurs for http/s locations.
Presently this issue has a work-around by using Get-ChecksumValid or the standard PowerShell Get-FileHash function and some extra lines in the chocolateyInstall.ps1 file, but it would be nice if it was integrated into the install functions.
Related Issues
https://github.com/chocolatey/choco/issues/1635
The description Get-ChocolateyWebFile is:
This will download a file from a url, tracking with a progress bar. It returns the filepath to the downloaded file when it is complete.
It's not intended to be used for anything except http/https files.
The description
Get-ChocolateyWebFileis:This will download a file from a url, tracking with a progress bar. It returns the filepath to the downloaded file when it is complete.
It's not intended to be used for anything except http/https files.
A solution for that would be to change all instances of web or url to network in the filename and description. I realise that's probably disagreeable and maybe unrealistic, but your answer implies that nothing should ever be improved or modified outside of its original purpose or description. There are already ftp: and file: sections in that function, so at some point someone thought it might be used for things other than URLs.
The chocolatey website itself preaches the change or die message: "Okay, time for some serious talk. If you are not learning better automation now, your peers are ... we need to continue to provide value for the organizations that employ us.". I'm a new user, but I'm assuming Chocolatey would not be what it is today if none of its functions were ever modified to do anything other than their original purpose.
Why is there an arbitrary criteria of what source network locations should have checksum checks? Why should a large company with a 20Gb Internet connection get the benefit of the pre-download checksum check, but a small company with a 1Gb internal network connection not? The smaller company would clearly benefit much more in this scenario, while the large company barely even needs it.
Why shouldn't we treat all network downloads the same? They all essentially have the same risks/benefits that the checksum checks are trying to mitigate/improve.
but your answer implies that nothing should ever be improved or modified outside of its original purpose or description.
That's quite a leap from what I said to this. That isn't what I was implying. We were talking about the Get-ChocolateyWebFile function. If you knew nothing about the function and I asked you to tell me what you thought it did, I'm going to suggest you'll say 'it gets files from the web'. If I told you that it actually gets files from any location whether it's web, FTP, file shares, local files, removable drives, etc. you'd likely suggest the name is wrong.
The purpose of the Get-ChocolateyWebFile is what I described above. We have millions of users who have expectations of how that function will work, both in terms of having used it for years and in terms of new users coming in and trying to understand how things work. That is not a scale I think you work at, so I appreciate it's not something that occurs to you.
A function should do one thing. And based on the naming, this function "gets web files using http / https".
There are already ftp: and file: sections in that function, so at some point someone thought it might be used for things other than URLs.
You had already said:
but your answer implies that nothing should ever be improved or modified outside of its original purpose or description.
And in this case, your answer implies that nothing should ever be improved or modified outside of its original purpose or description. I don't know what the original purpose of what this was when it was written. Perhaps it was intended to be a catch-all for everything network related. Things changes. We get better. We define things differently / more strictly / different scope.
I'm a new user, but I'm assuming Chocolatey would not be what it is today if none of its functions were ever modified to do anything other than their original purpose.
The Chocolatey functions, as they are today, are mainly the ones that have been there for a very long time. So they have stood the test of time.
The Chocolatey functions are what we determine the fundamental use cases that most people will need. They are not intended to cover every single scenario that anybody will ever need. That's an impossible task.
Why is there an arbitrary criteria of what source network locations should have checksum checks?
As I mentioned above, we cover the fundamental uses cases most people will need. The majority of people do not have the situation you describe. Most people simply reference a file on a share and download it, allowing their VPN to do the throttling or chunking of the file. I'd suggest your situation is niche.
Why should a large company with a 20Gb Internet connection get the benefit of the pre-download checksum check, but a small company with a 1Gb internal network connection not?
If you have a situation that requires you to do something different from what the open-source version of the tool provides, then you need to work around that. The open-source version as an expectation that you will get your hands dirty and build what you require for your niche use case.
Chocolatey for Business has throttling, allowing you to work in low bandwidth environments.
Why shouldn't we treat all network downloads the same? They all essentially have the same risks/benefits that the checksum checks are trying to mitigate/improve.
I go back to what I said earlier. Your situation is a niche one. We cover the fundamentals that most people use. If you are using open-source, get your hands dirty and write it. If you are using Chocolatey for Business, use package throttling.
After reading all of this, I think you're missing a big piece of Chocolatey CLI that is really critical.
Chocolatey CLI is extensible. It's been designed and built that way from the beginning. Because we cannot cover every single scenario, we encourage people to write their own functions, that they can reuse for themselves, their organisation or for the wider community.
You can either write that code and add it into each package (dot sourcing the script in, for example) or you can write a Chocolatey extension (as a side note to this, we will be doing a livestream of Chocolatey extension on 3 October). There is an example of the Community maintained extensions on GitHub.
If you choose to write an extension, you can share it with the wider community by pushing it in the same way you would push any other package. That extension can then be added as a dependency in other packages, allowing the package to take advantage of the functionality you've added in your extension.
I appreciate there are many words here, but I wanted to be to answer your questions because I believe there is a misunderstanding in all of this.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue will be closed in 14 days if it continues to be inactive.
Dear contributor,
As this issue seems to have been inactive for quite some time now, it has been automatically closed. If you feel this is a valid issue, please feel free to re-open the issue if / when a pull request has been added. Thank you for your contribution.