Problems creating a package from a SourceForge download URL
Perhaps this is a user "pilot" error, but when I create a Package in DejaCode from a SourceForge download URL, I get strange results. A recent Add Package using
https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
resulted in a Package with a filename of download rather than scribus-1.6.0.tar.gz.
It also resulted in the rather verbose PURL value of
pkg:generic/download?download_url=https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
I scanned the package, using the same download URL, directly in SCIO v32.0.8, and it returned a PURL value of
pkg:autotools/scribus-1.6.0
in the key_files_packages section
So it appears that the rather eccentric download conventions of SourceForge are messing things up a bit.
- Can we improve DejaCode to interpret the results of such a scan differently?
- Does such an improvement rather belong in SCIO?
- or should we prompt the DejaCode user with instructions how to provide a different, better, less eccentric download URL when processing a SourceForge package?
The problem stems that https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download is not the actual direct download URL but is followed by several URL redirects that end up in a mirror.
The final destination is something like where the first segment changes from mirror to mirror: https://kumisystems.dl.sourceforge.net/project/scribus/scribus/1.6.0/scribus-1.6.0.tar.gz
The stable final URL would be https://master.dl.sourceforge.net/project/scribus/scribus/1.6.0/scribus-1.6.0.tar.gz
None of these are practically visible and accessible. Therefore we should IMHO do these:
- [ ] Convert Sourceforge download URL to PURL. Update the the code to properly translate a Sourceforge URL to a PURL, either here or in the Python packageurl library, or both places.
- [ ] Consider updating "legacy" Sourceforge URLs to a canonical URL. This should be the one that is visible when browsing, ignoring redirections: https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
- [ ] Update MineCode Sourceforge miners to handle and store download URLs correctly
thanks @pombredanne your proposed solution looks good to me!
Note that we have support for the https://*.sourceforge.net/project/scribus/scribus/1.6.0/scribus-1.6.0.tar.gz URLs in the packageurl library, returning pkg:sourceforge/scribus/[email protected]
We simply have to add support for this URL syntax: https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
@DennisClark I've added support for those type of URLs in the purl library, see https://github.com/package-url/packageurl-python/issues/139 Also, as @pombredanne suggested, we are now using the final redirect URL to extract the proper filename.
With those changes, we now generate a proper PURL and filename:
Hi @tdruez I'm getting mixed results in Staging. My original scribus case went just fine, but I then tried another package from SourceForge, turbovnc-3.1.tar.gz , on staging with download URL of
https://sourceforge.net/projects/turbovnc/files/3.1/turbovnc-3.1.tar.gz/download
and it all went fine, including a scan, except that it did not assign any PURL values. See attached.
@DennisClark I've added support for the following URLs format:
- https://sourceforge.net/projects/turbovnc/files/3.1/turbovnc-3.1.tar.gz/download
- https://sourceforge.net/projects/ventoy/files/v1.0.96/Ventoy%201.0.96%20release%20source%20code.tar.gz/download
- https://sourceforge.net/projects/geoserver/files/GeoServer/2.23.4/geoserver-2.23.4-war.zip/download
You can give it another try.
@tdruez I tested the 3 you identified in your comment, plus the scribus package, and they all look rather good, with one small issue.
When I simply click on the download link for the ventoy package, it downloads a file name Ventoy 1.0.96 release source code.tar.gz which I think is correct and what they call it on the web site, but in DejaCode the filename is shown as Ventoy%201.0.96%20release%20source%20code.tar.gz with all the escape characters for the spaces. If we simply don't allow spaces in the DejaCode filename field, I guess that's ok, but it does look kind of strange. See attached.
@tdruez one other observation, which is not directly related to this issue, but something that is somewhat perplexing. DejaCode found the existing scans that I created yesterday for the 4 packages (good) and apparently they did not get re-scanned (fine I think) but it did not perform any of the auto-updates to fields on the package (not so good), such as the license-expression, even though 3 of the 4 scans have a declared license. See attached.
In the example above, the geoserver does not have a detected license anyway, so that's not a big deal, but the other 3 all have declared licenses.
@tdruez Sorry I did not catch this one yesterday, but the results from creating a package with
https://sourceforge.net/projects/spacesniffer/files/spacesniffer_1_3_0_2.zip/download
do not look so great. See attached.
It appears that there are an unknown number of (arbitrary) variations in the SourceForge download url's, suggesting we really do not have a satisfactory way to determine if we got them all. I'm sure you would like to finish this one, but it is possibly an unmanageable task. I'm ok if we go with "good enough" once we have fixed the ones we have actually discovered.
@DennisClark changes available for review:
Ventoy%201.0.96%20release%20source%20code.tar.gzis now properly unquoted- Added support for
https://sourceforge.net/projects/spacesniffer/files/spacesniffer_1_3_0_2.zip/download
one other observation, which is not directly related to this issue, but something that is somewhat perplexing. DejaCode found the existing scans that I created yesterday for the 4 packages (good) and apparently they did not get re-scanned (fine I think) but it did not perform any of the auto-updates to fields on the package (not so good), such as the license-expression, even though 3 of the 4 scans have a declared license. See attached.
Entered as https://github.com/nexB/dejacode/issues/30
@tdruez The spacesniffer package creation works great now. The Ventoy package creation issue is fixed, although it was very slow to complete the Add Package step, with the cursor spinning for more than 2 minutes; I tested it with a different Ventoy version and had the same slow response. So it all appears to be working fine, but you might want to check on the performance problem.