termux-packages icon indicating copy to clipboard operation
termux-packages copied to clipboard

chore(scripts/termux_step_install_license): total refactor

Open TomJo2000 opened this issue 1 year ago • 2 comments

Part of:

  • #20858

This PR is a total refactor of scripts/build/termux_step_install_license.sh. The goal was to eliminate shellcheck warnings and reduce jank.

TomJo2000 avatar Jul 15 '24 20:07 TomJo2000

I'd like some clarification on this check in the current version. https://github.com/termux/termux-packages/blob/eeca771cecf05365533399c1bfeb264321548d6a/scripts/build/termux_step_install_license.sh#L81-L85

I can't see what it is meant to accomplish. It seems pretty safe to say that it is intended to loop over all licenses installed for the package. But the if condition doesn't make any sense.

  • 1.) The * is quoted, and thus not a wildcard.
  • 2.) The actually check is if $LICENSE is equal to the path where the package licenses are stored. Which would always be true, if it weren't for the mistakenly quoted *, making it never true unless there is a license called LICENSE* which will never be the case.

Since I don't see any way for that check to ever fail, I removed it. If we do wanna account for all of the license files, I can rewrite the check. Though that accounting is already done by the rest of the script above.

TomJo2000 avatar Jul 15 '24 20:07 TomJo2000

All initial smoke tests seem favorable. I've rebuilt most of the root repo locally with the refactored license installation step changes.

TomJo2000 avatar Jul 15 '24 22:07 TomJo2000

The check is to verify so that there are files in $TERMUX_PREFIX/share/doc/$TERMUX_PKG_NAME. If no licenses were installed for some reason, for example if TERMUX_PKG_LICENSE is empty, then LICENSE* will not expand and remain LICENSE*, and the check catches that.

We can probably check this in a more intuitive way, maybe with find.

Grimler91 avatar Jul 16 '24 06:07 Grimler91

find would definitely be more intuitive.

TomJo2000 avatar Jul 16 '24 06:07 TomJo2000

Looks fine otherwise from a quick look.

However, note that debian does not use LICENSE as file names, it uses (a single) copyright, so would be better to shift to that.

  • https://www.debian.org/releases/stable/i386/ch01s08.en.html

Additionally, lot of packages use the machine-readable debian/copyright file on debian, including ruby, zsh, etc. It puts all the license texts under a single file. You can find those packages on the system with grep -r -l -F "https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/" /usr/share/doc | sort. I personally don't like the part about putting all the license texts under the same file as that's too verbose and harder to understand, so I break the spec and use License: [<license_name>](<path_to_license_text_file>) instead of the License: <license_name>\n<license_text>, hopefully a future spec version adds support for that.

  • https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
  • https://lists.debian.org/debian-project/2010/08/msg00269.html
  • https://wiki.debian.org/Proposals/CopyrightFormat
  • http://ftp.debian.org/debian/pool/main/
  • https://askubuntu.com/a/620069

agnostic-apollo avatar Jul 16 '24 06:07 agnostic-apollo

Well if you're in the mood to break APIs. I did briefly think about replacing the $COUNTER suffix with the $LICENSE identifier. So it would be LICENSE.MIT and LICENSE.HPND, but I dropped that idea because of some of the less filename friendly license idenifiers. And maintaining API compatibility with the previous version of the license install step.

TomJo2000 avatar Jul 16 '24 06:07 TomJo2000

Looks fine otherwise from a quick look.

However, note that debian does not use LICENSE as file names, it uses (a single) copyright, so would be better to shift to that.

  • https://www.debian.org/releases/stable/i386/ch01s08.en.html

Additionally, lot of packages use the machine-readable debian/copyright file on debian, including ruby, zsh, etc. It puts all the license texts under a single file. You can find those packages on the system with grep -r -l -F "https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/" /usr/share/doc | sort. I personally don't like the part about putting all the license texts under the same file as that's too verbose and harder to understand, so I break the spec and use License: [<license_name>](<path_to_license_text_file>) instead of the License: <license_name>\n<license_text>, hopefully a future spec version adds support for that.

  • https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
  • https://lists.debian.org/debian-project/2010/08/msg00269.html
  • https://wiki.debian.org/Proposals/CopyrightFormat
  • http://ftp.debian.org/debian/pool/main/
  • https://askubuntu.com/a/620069

I've had a read through the linked resources. And while I think we might wanna consider that, I think it's out of scope for this PR. I'm also fairly sure it would necessitate rebuilds of all packages, no?

TomJo2000 avatar Jul 16 '24 06:07 TomJo2000

For checking if license file was added.

license_files="$(find "$TERMUX_PREFIX/share/doc/$TERMUX_PKG_NAME" -maxdepth 1 -type f -name "LICENSE*")"
if [ -z "$license_files" ]; then
    # error out
fi

For mulitple filenames, you can use something like \( -name "LICENSE*" -o -name "copyright*" \).

agnostic-apollo avatar Jul 16 '24 06:07 agnostic-apollo

For checking if license file was added.

license_files="$(find "$TERMUX_PREFIX/share/doc/$TERMUX_PKG_NAME" -maxdepth 1 -type f -name "LICENSE*")"
if [ -z "$license_files" ]; then
    # error out
fi

For mulitple filenames, you can use something like \( -name "LICENSE*" -o -name "copyright*" \).

Should that be a find -L? I guess we don't really care if its a symlink or not, so no?

TomJo2000 avatar Jul 16 '24 07:07 TomJo2000

I don't suggest using the machine-readable debian/copyright format right now, I just wanted to notify that the standard exists, and that it solves quite a few issues, especially in relation to which source files are under what license and the format being machine readable.

However, I do suggest that we use copyright and copyright.<counter> format right now as that is the standard which wouldn't be hard to follow and would provide consistency with tools processing package license files. Packages can get rebuilt whenever they are.

agnostic-apollo avatar Jul 16 '24 07:07 agnostic-apollo

However, I do suggest that we use copyright and copyright.<counter> format right now as that is the standard which wouldn't be hard to follow and would provide consistency with tools processing package license files. Packages can get rebuilt whenever they are.

Yep, that's definitely far easier than implementing the whole standard right away. Done locally now.

Packages can get rebuilt whenever they are.

Eventual consistency, eh? :P

TomJo2000 avatar Jul 16 '24 07:07 TomJo2000

Should that be a find -L?

Yeah, best use \( -type f -o -type l \) instead.

agnostic-apollo avatar Jul 16 '24 07:07 agnostic-apollo

Thanks.

agnostic-apollo avatar Jul 16 '24 07:07 agnostic-apollo

Eventual consistency, eh? :P

Well, if you wanna rebuild all 2000+ packages, who am I to say no, have fun with failed builds and missing source urls :p

agnostic-apollo avatar Jul 16 '24 07:07 agnostic-apollo

Well, if you wanna rebuild all 2000+ packages, who am I to say no, have fun with failed builds and missing source urls :p

CI's been flakey lately anyway. I sure as hell am not poking it with the "full repo rebuild" stick.

TomJo2000 avatar Jul 16 '24 07:07 TomJo2000

Alright, that's:

  • copyright.${counter} as license name
  • adding copyright to the search list
  • turning the license filepath into a variable
  • accursed hashmap magic

If you still want the find check for making sure all the licenses are present and accounted for I'd appreciate it if you could provide me a cleaned up version of how you want that to work.

All other threads should be resolved now. So the find check is the last blocker here.

TomJo2000 avatar Jul 16 '24 08:07 TomJo2000

# Will match regular files and symlinks to regular files
$ find -L . -maxdepth 1 -type f

# Will match regular files, symlinks to regular files and broken symlinks
$ find -L . -maxdepth 1 \( -type f -o -type l \)

# Will match regular files, symlinks to regular files, symlinks to directories and broken symlinks
$ find . -maxdepth 1 \( -type f -o -type l \)

Based on above, best use following. It will not match broken symlinks to license files, assuming that's what we want.

license_files="$(find -L "$TERMUX_PREFIX/share/doc/$TERMUX_PKG_NAME" -maxdepth 1 -type f -name "copyright*")"
if [ -z "$license_files" ]; then
    # error out
fi

agnostic-apollo avatar Jul 16 '24 09:07 agnostic-apollo

Based on above, best use following. It will not match broken symlinks to license files, assuming that's what we want.

license_files="$(find -L "$TERMUX_PREFIX/share/doc/$TERMUX_PKG_NAME" -maxdepth 1 -type f -name "copyright*")"
if [ -z "$license_files" ]; then
    # error out
fi

Yep I think that tracks with what we are actually trying to check.

TomJo2000 avatar Jul 16 '24 09:07 TomJo2000

I think license file symlinks may not be valid during build time, let me check.

agnostic-apollo avatar Jul 16 '24 09:07 agnostic-apollo

Yup, they wouldn't exist, use following with \( -type f -o -type l \).

license_files="$(find -L "$TERMUX_PREFIX/share/doc/$TERMUX_PKG_NAME" -maxdepth 1 \( -type f -o -type l \) -name "copyright*")"
if [ -z "$license_files" ]; then
    # error out
fi

agnostic-apollo avatar Jul 16 '24 09:07 agnostic-apollo

By symlinks not being valid, I meant that termux-licenses wouldn't exist at ../../../../share/LICENSES/${LICENSE}.txt at build time and only in $TERMUX_SCRIPTDIR/packages/termux-licenses/LICENSES/${LICENSE}.txt. They would be valid on the device though.

agnostic-apollo avatar Jul 16 '24 09:07 agnostic-apollo

Going with

local license_files
license_files="$(find -L "$TERMUX_PREFIX/share/doc/$TERMUX_PKG_NAME" -maxdepth 1 \( -type f -o -type l \) -name "copyright*")"
[[ -n "$license_files" ]] || {
	termux_error_exit "No LICENSE file was installed for $TERMUX_PKG_NAME"
}

which is semantically identical.

TomJo2000 avatar Jul 16 '24 09:07 TomJo2000

which is semantically identical.

How dare you!

agnostic-apollo avatar Jul 16 '24 09:07 agnostic-apollo

How dare you!

I think the || more clearly identifies this as an exception path.

TomJo2000 avatar Jul 16 '24 09:07 TomJo2000

Yeah, yeah, it's fine.

agnostic-apollo avatar Jul 16 '24 09:07 agnostic-apollo

This would only catch all license files missing. Am I correct in that assumption?

Just wondering if we wanna do something about individual missing licenses.

We can probably co-opt the $COUNTER for checking if we have as many licenses as expected. Although, the only way I can see that happening is if a specified license file is no longer present in the source repo. In which case the cp of the file would fail. Thus catching the issue earlier. Though I think that would silently error.

The cp would error with a message along the lines of:

cp: cannot stat 'License.no.such.file': No such file or directory

Which works just fine.

All good from me then.

TomJo2000 avatar Jul 16 '24 09:07 TomJo2000

Yes, check would pass if at least one license file exists.

If you wanna ensure no missing licenses, in case, a specific loop never ran to fail for cp/ln, etc, then use

local license_list
if [[ -n "${TERMUX_PKG_LICENSE_FILE}" ]]; then
    license_list="$TERMUX_PKG_LICENSE_FILE"
else
    license_list="$TERMUX_PKG_LICENSE"
fi

...

license_files="$(find -L "$TERMUX_PREFIX/share/doc/$TERMUX_PKG_NAME" -maxdepth 1 \( -type f -o -type l \) -name "copyright*")"
if [ -z "$license_files" ]; then
    # error out
fi

if [ "$(echo "$license_files" | wc -l)" != "$(echo "$license_list" | sed -E -e 's/[ \t]//g' -e 's/,/\n/g' | wc -l)" ]; then
    # error out
fi

agnostic-apollo avatar Jul 16 '24 09:07 agnostic-apollo

I think I'll just be satisfied with cp's error message if it can't find the license file in the source repo.

And ln is only invoked for "known generic licenses", so it should be safe to assume those will not be missing.

TomJo2000 avatar Jul 16 '24 09:07 TomJo2000