Kubler download_portage_snapshot() dl_name $_TODAY timezone difference can have different name to origin
The distfiles.gentoo.org hosting the portage snapshots has a portage-latest.tar.xz and portage-YYYYMMDD.tar.xz (and .bz2 files). The portage-latest.tar.xz will be identically to the latest portage-YYYYMMDD.tar.xz.
The function download_portage_snapshot() will download the portage snapshot, with $PORTAGE_DATE defaulting to latest. It will find portage-latest.tar.xz and download it with $dl_name based off $_TODAY. Due to timezone differences this can mean that the file will be named portage-20220914.tar.bz2, when the equivalent file on the server was portage-20220913.tar.bz2.
Later if upstream released a portage-20220914.tar.bz2, if locally $PORTAGE_DATE was set to 20220914, it would not download the new snapshot as it already has a file ("wrongly") named that, but they would be different files.
I'm working on running Kubler in CI/CD, and I'm caching downloads and other files to speed things up. I want to have consistent behaviour between runs. If I run a build before midnight and after midnight, and there's been no changes to the distfiles mirror, the 2nd run of kubler will download the same portage-latest.tar.xz file but name it differently.
In CI/CD I want consistency, I generally want things up-to-date. I like the default to latest, but I want the local name to match the remote name.
I wrote a kubler cmd to get the latest portage filename.
#!/usr/bin/env bash
# Based off lib/core.sh `fetch_stage3_archive_name()`
# Fetch latest portage snapshot archive name/type, returns exit signal 3 if no archive could be found
function fetch_portage_archive_name() {
__fetch_portage_archive_name=
local portage_url portage_regex remote_files remote_line remote_date remote_file_type max_cap
portage_url="http://distfiles.gentoo.org/snapshots/"
readarray -t remote_files <<< "$(wget -qO- "${portage_url}")"
remote_date=0
get_stage3_archive_regex "portage"
# shellcheck disable=SC2154
portage_regex="$__get_stage3_archive_regex"
for remote_line in "${remote_files[@]}"; do
if [[ "${remote_line}" =~ href=\"${portage_regex}\" ]]; then
max_cap="${#BASH_REMATCH[@]}"
is_newer_stage3_date "${remote_date}" "${BASH_REMATCH[$((max_cap-3))]}${BASH_REMATCH[$((max_cap-2))]}" \
&& { remote_date="${BASH_REMATCH[$((max_cap-3))]}${BASH_REMATCH[$((max_cap-2))]}";
remote_file_type="${BASH_REMATCH[$((max_cap-1))]}"; }
# We keep going to find the latest rather than the first
fi
done
[[ "${remote_date//[!0-9]/}" -eq 0 ]] && return 3
__fetch_portage_archive_name="portage-${remote_date}.tar.${remote_file_type}"
}
function main() {
#echo "kubler dir: ${_KUBLER_DIR}"
#echo "current namespace: ${_NAMESPACE_DIR}"
#echo "Finding latest portage"
# We are abusing `fetch_stage3_archive_name()`
## shellcheck disable=SC2034
#STAGE3_BASE="portage"
## shellcheck disable=SC2034
#ARCH_URL="http://distfiles.gentoo.org/snapshots/"
## This will find the first
#fetch_stage3_archive_name
## shellcheck disable=SC2154
#echo "$__fetch_stage3_archive_name"
# This will find the latest
fetch_portage_archive_name
echo "$__fetch_portage_archive_name"
}
main "$@"
This works, and I could use it to set the $PORTAGE_DATE, to get the consistent behaviour.
$ kubler portage
portage-20220907.tar.bz2 <-- fetch_stage3_archive_name abuse
portage-20220914.tar.bz2 <-- fetch_portage_archive_name variant
I think it would be good to change Kubler's behaviour to download the latest YYYYMMDD portage snapshot rather than downloading and renaming portage-latest.
I might be worth refactoring fetch_stage3_archive_name() into a generic version, optionally exiting on first match (current behaviour), or continuing to latest match (needed for portage snapshots), and generalising the name of get_stage3_archive_regex().
I would also like the option to prefer the archive type bz2 vs xz.