resticprofile
resticprofile copied to clipboard
Question: How to calculate a value in `before-run` which is available afterwards for the restic commands
I am having a hard time, getting my head around how the environment is available during resticprofile runs and how I can manipulate it to inject values to variables which I need to be present during restic execution (or at parsing time)
Ultimately the Problem to solve is that I want to have a check profile which does a read-data-subset: n/m where m is some value that can be static but n should count upwards based on the day of year and be reset to 1 when it reaches m.
I first built some complicated things with multiple profiles that make use of {{ (.Now.AddDate 0 0 -(m-1)).YearDay }} scheduling it with multiple OnCalendar-directives (Which kinda worked), but that requires a lot of code duplication which I am apparently not able (given my current knowledge) to deduplicate using the template language and ends up in for example at least 7 profile entries if m happens to be 50. Also to deal with the leap year is really complicated and makes everything even messier.
So I can easily calculate the value of n externally using a script or binary. I only need to have that value present for accessing it in the read-data-subset: call.
I am not able to even simulate it by hardcoding an env: variable. At the actual check command line, it is always empty.
I am a bit lost at this point. Any Ideas how to proceed?
P.S.:
This was the complicated approach which kinda works, I described above:
read-data-daily-1:
inherit: dummy
run-before: "curl -m 10 --retry 5 {{ $healthchecks_ping_url }}/slug/start?create=1"
run-finally: "if [ -z $ERROR_EXIT_CODE ];then ERROR_EXIT_CODE=\"0\";fi;curl -m 10 --retry 5 \"{{ $healthchecks_ping_url }}/slug/$ERROR_EXIT_CODE\""
check:
read-data-subset: {{ .Now.YearDay }}/50
# first 50 days of the year
schedule:
- '*-1-* 02:00:00'
- '*-2-1..19 02:00:00'
schedule-permission: user
schedule-priority: background
schedule-lock-mode: default
schedule-lock-wait: 1h
And this was an approach where I experimented with a hardcoded env: config, which did not expand the variable on the actual check execution:
read-data-daily:
inherit: read-data-daily-1
env:
EXAMPLE_PORTION: "100"
run-before:
# Is correctly expanded at runtime
- "echo \"Portion: $EXAMPLE_PORTION\""
check:
# is empty
read-data-subset: ${EXAMPLE_PORTION}/255
schedule:
- 'daily'
After more research, I doubt that things in the configuration can be dynamically calculated at all if not possible via the built-in mechanisms that the templates provide.
So I came up with a workaround. I use run-before to collect the parameters for read-data-subset and execute the restic command manually there. That way scheduling and using resticprofile as the wrapper still works. However, the check command without arguments is executed nonetheless after the run-before-section has passed. But since this is a less time consuming option the advantages outweigh the disadvantages imho. To speed that up even more, I explicitly use with-cache: true for this.
In case anyone else should be interested, Here is how I implemented the workaround:
# profile to perform a full read-data consistency check on <repo>. Data is read in subsets. We split the data so that n/m of the whole set is read each day.
# That means after m passed days, we visited most of the repo during that interval (of course, changes might be re-checked only every other m days when data has been added)
# Since there seems to be no way to natively use a dynamic value for the read-data-subset parameter as of now (2024-01-16), we have to do a hack where we run the entire restic
# command manually in `run-before` with dynamically calculated values. Resticprofile will then additionally call a regular check afterwards. This is unavoidable when we want to use
# the resticprofile api as is. We can try to keep the execution time of that quick check as low as possible by using the `--with-cache` option for this. We should switch to a more
# native approach as soon as a possibility to add dynamic values for the `read-data-subset` parameter will be available. See https://github.com/creativeprojects/resticprofile/issues/299
read-data-daily:
inherit: dummy
env:
MAX_DATA_PORTIONS: 30
BIN_DIR: {{ .CurrentDir }}/../../../bin
RESTIC_HOST_DIR: {{ .CurrentDir }}/../../../hosts/<host>
run-before:
- curl -m 10 --retry 5 {{ $healthchecks_ping_url }}/slug/start?create=1
- echo "Starting manual restic check";
echo "";
source $RESTIC_HOST_DIR/resticenv.sh;
PORTION_OF_THE_DAY=$($BIN_DIR/calculate-portion.sh $MAX_DATA_PORTIONS);
restic check --read-data-subset=$PORTION_OF_THE_DAY/$MAX_DATA_PORTIONS;
unset RESTIC_REPOSITORY;
unset RESTIC_PASSWORD;
echo "proceeding with resticprofile check command";
echo ""
run-finally: "if [ -z $ERROR_EXIT_CODE ];then ERROR_EXIT_CODE=\"0\";fi;curl -m 10 --retry 5 \"{{ $healthchecks_ping_url }}/slug/$ERROR_EXIT_CODE\""
check:
with-cache: true
schedule: '*-*-* 01:00:00'
schedule-permission: user
schedule-priority: background
schedule-lock-mode: default
schedule-lock-wait: 1h
And the script calculate-portion.sh which calculates n for a given m is here:
#!/bin/bash
print_usage_and_exit(){
echo "Usage: $0 <m> [<o>] [<YYYY-MM-DD>]"
echo "This script calculates a specific value <n> on each different day it is called."
echo "<n> will always be a positive integer in the range [1,<m>]."
echo "<m> is passed as a mandatory argument. It has to be a positive integer."
echo "<o> is an optional, non-positional argument which is an offset in days to the day the calculation is done on."
echo "<YYYY-MM-DD> is an optional, non positional date argument for which <n> should be calculated. If not given, the script assumes the current date for the calculation."
echo "For each call of the script on subsequent <YYYY-MM-DD>, as long as the offset and the value of <m> are not changed,the resulting value of <n> is"
echo "incremented by one. So that <n> cycles from 1 to <m> on each call on subsequent days. It will start over with 1 the day after the day on which it equals <m>"
exit $1
}
# echo "Your input for <o> was \"$1\". <o> must be an integral value!"
validate_offset(){
if ! [[ $1 =~ ^-?[0-9]+$ ]];then
return 1
fi
return 0
}
validate_date(){
if [[ ! "$1" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]; then
return 1
fi
if ! date -d "$1" >/dev/null 2>&1; then
return 1
fi
return 0
}
# Check if at least one argument <m> is provided
if [ $# -lt 1 ]; then
echo "At least one argument for <m> has to be given"
echo ""
print_usage_and_exit 1
fi
# Check if at most 3 arguments are provided
if [[ $# -gt 3 ]]; then
echo "Too many arguments given. Expected 3 at most, got $#"
echo ""
print_usage_and_exit 1
fi
# Extract the upper limit "m" from the arguments and test it for being a positive integer
m=$1
if ! [[ $m =~ ^[0-9]+$ ]]; then
echo "Your input for <m> was \"$m\". <m> must be a positive integral value!"
echo ""
print_usage_and_exit 1
fi
# Defaults
offset_days=0
input_date=$(date +%y-%m-%d)
# if a second argument is provided, check whether it is an offset or a year
if [ $# -ge 2 ]; then
if validate_offset $2; then
offset_days=$2
elif validate_date $2; then
input_date=$2
else
echo "Could neither get a valid offset value nor a valid date from argument \"$2\". Offsets have to be positive and dates have to be passed in the form of \"YYYY-MM-DD\""
echo ""
print_usage_and_exit 1
fi
fi
# If a third argument is provided, check whether it is an offset or a year
if [ $# -eq 3 ]; then
if (! validate_offset $3) && (! validate_date $3) then
echo "Could neither get a valid offset value nor a valid date from argument \"$3\". Offsets have to be positive and dates have to be passed in the form of \"YYYY-MM-DD\""
echo ""
print_usage_and_exit 1
fi
if validate_offset $3 && validate_date $2; then
offset_days=$3
elif validate_date $3 && validate_offset $2; then
input_date=$3
else
echo "It looks like you either provided two date arguments or two offset arguments. Please provide either type once. Offsets have to be positive and dates have to be passed in the form of \"YYYY-MM-DD\""
echo ""
print_usage_and_exit 1
fi
fi
# Calculate days since the Unix epoch
input_timestamp=$(date -d "$input_date" +%s)
seconds_since_epoch=$(( (input_timestamp - $(date -d "1970-01-01" +"%s")) ))
days_since_epoch=$(( (seconds_since_epoch / 86400) ))
days_since_epoch_with_offset=$(( (days_since_epoch + offset_days) ))
# Only positive results are allowed for the calculated days since the epoch when the offset is applied
if [[ $days_since_epoch_with_offset -lt 0 ]]; then
echo "Invalid date or offset given. Use dates after and including 1970-01-01 + days offset value: $offset_days"
print_usage_and_exit 1
fi
# Calculate the cyclic value between 1 and m
cyclic_value=$(( (1 + (days_since_epoch_with_offset % m)) ))
#echo "Days since the epoch for $input_date and an offset of $offset_days days: $days_since_epoch_with_offset"
#echo "Cyclic value (1 to $m) since the Unix epoch for $input_date: $cyclic_value"
echo "$cyclic_value"
I do like this idea of using an environment variable inside any configuration value:
read-data-daily:
inherit: read-data-daily-1
env:
EXAMPLE_PORTION: "100"
run-before:
# Is correctly expanded at runtime
- "echo \"Portion: $EXAMPLE_PORTION\""
check:
# introducing new feature to make it work
read-data-subset: ${EXAMPLE_PORTION}/255
schedule:
- 'daily'
that would simplify a lot of the pain of using template variables, which can only be referenced during the compilation of the configuration anyway.
Now we need to extract the environment variables that were set during a command. I had an idea about that:
What if we wrap up any command inside a tiny shell script, like:
[command goes here, any `run-before`, `run-after` or calling `restic`]
# grab the exit code
exitCode = $?
# save environment variables
env > output_env.tmp
# return the exit code from the command
exit $exitCode
This way we can save the environment variables that have been set by the command.
It should be possible to use set for cmd.exe and Get-ChildItem ENV: on powershell
What do you think @jkellerer ?
@creativeprojects I tried the new version but to no avail. I cannot access those variables later. I have a feeling that this is not yet solved.
Or, I am doing something entirely wrong.
I am still using my approach from above, which works good by bypassing the actual check command and executing it inside before-run.
I now tried to echo the variables value to the env file and use them later on, however, the values are not accessible. The show command reads "no value" when I use the vars with read-data-subset. The env file is correctly populated with the values.
Here is my example code:
target-read-data-daily:
inherit: target
env:
MAX_DATA_PORTIONS: 50
BIN_DIR: "{{ .ConfigDir }}/../../../bin"
RESTIC_HOST_DIR: "{{ .ConfigDir }}/../../../hosts/hostname"
HEALTHCHECKS_SLUG: "hostname-backup-check-target"
env-file: ".env"
run-before:
- $CURL_COMMAND $HEALTCHECKS_PING_URL/$HEALTHCHECKS_SLUG/start?create=1
- set -e;
echo "Calculating portion of the day";
echo "";
PORTION_OF_THE_DAY=$($BIN_DIR/calculate-portion.sh $MAX_DATA_PORTIONS);
echo "PORTION_OF_THE_DAY=$PORTION_OF_THE_DAY" > ".env";
echo "MAX_DATA_PORTIONS=$MAX_DATA_PORTIONS" >> ".env";
echo "proceeding with resticprofile check command";
echo ""
run-finally: 'if [ -z $ERROR_EXIT_CODE ];then ERROR_EXIT_CODE="0";fi;$CURL_COMMAND "$HEALTCHECKS_PING_URL/$HEALTHCHECKS_SLUG/$ERROR_EXIT_CODE"'
check:
with-cache: false
read-data-subset: "$PORTION_OF_THE_DAY/$MAX_DATA_PORTIONS"
schedule: "*-*-* 06:30:00"
Am I doing it wrong?
Hey!
I think we need to use the internal env file as referenced with {{ env }}.
I haven't tried this configuration but it should work like that:
target-read-data-daily:
inherit: target
env:
MAX_DATA_PORTIONS: 50
BIN_DIR: "{{ .ConfigDir }}/../../../bin"
RESTIC_HOST_DIR: "{{ .ConfigDir }}/../../../hosts/hostname"
HEALTHCHECKS_SLUG: "hostname-backup-check-target"
env-file: ".env"
run-before:
- $CURL_COMMAND $HEALTCHECKS_PING_URL/$HEALTHCHECKS_SLUG/start?create=1
- set -e;
echo "Calculating portion of the day";
echo "";
PORTION_OF_THE_DAY=$($BIN_DIR/calculate-portion.sh $MAX_DATA_PORTIONS);
echo "PORTION_OF_THE_DAY=$PORTION_OF_THE_DAY" > {{ env }};
echo "MAX_DATA_PORTIONS=$MAX_DATA_PORTIONS" >> {{ env }};
echo "proceeding with resticprofile check command";
echo ""
run-finally: 'if [ -z $ERROR_EXIT_CODE ];then ERROR_EXIT_CODE="0";fi;$CURL_COMMAND "$HEALTCHECKS_PING_URL/$HEALTHCHECKS_SLUG/$ERROR_EXIT_CODE"'
check:
with-cache: false
read-data-subset: "$PORTION_OF_THE_DAY/$MAX_DATA_PORTIONS"
schedule: "*-*-* 06:30:00"
That was the first I tried yesterday, but that also did not work.
Oh I get it. Sorry I missed that part of the configuration.
As it turns out you cannot use an environment variable inside any flag (like read-data-subset).
We need to do a big refactoring to allow that. It's on our list of things to do; but it's a lot more complicated than it looks 😞
In a nutshell, the configuration is fully resolved before starting the profile. In your case we need to re-scan the configuration and update the values during the execution of the profile.