adblock-lean Improve processing logic

Improve and unify logic of downloaded allowlist and blocklist processing
Improve console output with highlight colors and spacers
Implement config validation and parsing
Change config format
Implement config migration
Improve run state checks and reporting
Remove support for running custom commands specified in config with eval
Support specifying in config a custom script to source and run functions report_sucess(), report_failure() from
Support custom dnsmasq directory when specified in UCI
Append a custom entry to the blocklist and use it to verify that the blocklist was successfully loaded

(I may have forgotten something)

Aug 12 '24 17:08 friendly-bits

@Wizballs just tagging you so that you see this impressive looking PR.

Aug 12 '24 17:08 lynxthecat

Gave everything a quick test run only and is working great. I'll run more detailed tests when I can, but this is looking really good. So many improvements and added functions!

Aug 12 '24 23:08 Wizballs

^

parse_config(): detect and prevent previously undetected invalid entries
parse_config(): support reading and comparing current and default config format versions
print_def_config(): add config format version
write_config(): validate new config before moving to permanent storage
load_config(): don't validate fixed config (now validated by write_config)
gen_config(): don't validate new config (now validated by write_config)
load_config(), parse_config(), gen_config(): simplify communication between functions

Aug 13 '24 09:08 friendly-bits

@lynxthecat Would you prefer the commits squashed into one?

Aug 13 '24 10:08 friendly-bits

No please keep them all! But could you maybe capitalise the one beginning “fix.. “ to make it “Fix..” for consistency?

Aug 13 '24 10:08 lynxthecat

^ Fixed the 'fix'

Aug 13 '24 18:08 friendly-bits

^

parse_config(): compact some code
parse_config(): don't print legacy keys
load_config(): save old config before fixing it
update(): call load_config() upon success

Aug 13 '24 20:08 friendly-bits

@Wizballs I’ll also run a couple of tests and if ours both passes then we can merge in. So far so good for my ordinary day to day runs.

Aug 14 '24 06:08 lynxthecat

I'm on now also. Testing and will report back...

Aug 14 '24 07:08 Wizballs

Surprised (good way) by the number of commits every time I look ;)

Working:

start/stop/pause/resume
rouge check (block and allow)

Unsure:

gen_config is just overwriting the config file to defaults. It's not reusing existing valid config entries. If this is intended all good.

And two items that are for later maybe (one discussed previously):

Discussed previously, but will the custom script have a default path, and have a config entry disable by default? This would save editing the path on a gen_config, could just switch to a YES/1 or whatever the config entry would be.
If a cron job is called while an invalid config exists, it correctly exists. Should it report_failure so people can receive a message?

user.info Fix your config file '/root/adblock-lean/config' or generate default config using 'service adblock-lean gen_config'.
user.err adblock-lean: Error: Failed to load config.

Aug 14 '24 07:08 Wizballs

gen_config is just overwriting the config file to defaults. It's not reusing existing valid config entries

"It's not a bug, it's a feature". In what scenario would you want gen_config() to reuse existing values? My logic is this: if user's config is technically compliant to the format then there is no need to fix it. If it's not compliant then load_config() will pick this up and suggest fixes. If the user is unhappy with their config for any reason, or if the config file is missing, or if the config file has issues which can not be automatically fixed, then they can discard the old config and have it replaced with the default one - this is what gen_config() is for.

Aug 14 '24 08:08 friendly-bits

, it's a feature

Perfect got it. I'll check the load_config part....

Aug 14 '24 08:08 Wizballs

Alright I got it now. Intentionally deleted lines and messed up my config. And it recommended the missing lines and highlighted entries that shouldn't be there. Worked great on multiple different test.

Very smooth.

Aug 14 '24 08:08 Wizballs

will the custom script have a default path, and have a config entry disable by default? This would save editing the path on a gen_config, could just switch to a YES/1 or whatever the config entry would be.

This makes sense. Pro's: lower chance of user error when copy-pasting the path, slightly easier process for the user. Con's: adds another config option. I don't mind either way.

Aug 14 '24 08:08 friendly-bits

If a cron job is called while an invalid config exists, it correctly exists. Should it report_failure so people can receive a message?

Checking on this... Will comment a bit later.

Aug 14 '24 08:08 friendly-bits

I think, leave this as is, keeps config options lower as you highlighted. Anyone using a custom script should be able to manage this. Also @Ree9 the default recommended path for this in the config file from antonk is # Recommended path is '/usr/libexec/abl_custom-script.sh' which the luci app has permission to access Mabye worth changing from your Luci default /root/adblock-lean/custom-script location?

Aug 14 '24 08:08 Wizballs

If a cron job is called while an invalid config exists, it correctly exists. Should it report_failure so people can receive a message?

So the challenge here is this: if the config is invalid, can we trust it enough to source the custom script and call the report_failure() function from it? Current approach relies on valid config as 'root-of-trust' for the matter of safely using any config entries. Technically we could implement an exclusion for the custom_script option but I'm not sure if this is a good idea. Not saying that it's not. Just not sure.

Aug 14 '24 08:08 friendly-bits

"It's not a bug, it's a feature". In what scenario would you want gen_config() to reuse existing values? My logic is this: if user's config is technically compliant to the format then there is no need to fix it. If it's not compliant then load_config() will pick this up and suggest fixes. If the user is unhappy with their config for any reason, or if the config file is missing, or if the config file has issues which can not be automatically fixed, then they can discard the old config and have it replaced with the default one - this is what gen_config() is for.

Just so I understand, valid old entries will be retained, but invalid old entries will be deleted and any needed replacements adds as defaults, right? So a custom config element without double quotes will get lost, but a custom config element with double quotes will be retained? I think that's probably just fine.

Aug 14 '24 08:08 lynxthecat

can we trust it enough to source the custom script and call the report_failure()

Looking at this another way also, if a custom notification does not come through, that should be reason alone to investigate. Leave as, I actually don't think it's overly important or worth additional lines of code. Thanks for bouncing idea's back and forth, sometimes it just helps to put a reason behind it.

Aug 14 '24 08:08 Wizballs

Seems super daring and ambitious to have config elements automatically changed, but it does seem to be working. Nice one.

Aug 14 '24 08:08 lynxthecat

config elements automatically changed, but it does seem to be working. Nice one

It works really well. I messed my config up various different ways, and it always fixed missing elements, or reported "rogue" entries ;)

Aug 14 '24 09:08 Wizballs

While we're at it, maybe now it's a good time to stop using /root and move stuff to where it should normally be? I.e. probably to /etc/adblock-lean/.

Aug 14 '24 09:08 friendly-bits

Seems super daring and ambitious to have config elements automatically changed

It looks more impressive than it actually is. The logic is very simple. We check for critical issues which would deem the config unusable - if found, we throw an error and quit. If not found then we load the values for expected keys from the existing config, while registering missing and unexpected keys, or entries which don't have double quotes around the values. Then we report of any found issues to the user and ask their permission to fix them. The fix is ultra-simple: we take the default config file and replace the values in it with values of the known-good entries in the current config. Then we validate the result. If it passes validation, we write it out. That's it.

Aug 14 '24 09:08 friendly-bits

Well I've thrown everything at this, and it's come up spot on. Handles every situation / config as it should!

It's really awesome what you've done/doing with this project.

Aug 14 '24 09:08 Wizballs

The only shortcoming of this method (which I'm aware of) is that custom comments are not preserved after fixing the config. You essentially get a vanilla config with your existing values for valid keys. Moving the custom comments would be really challenging, because the comments are probably attached to entries in some way which a simple algorithm has no means to figure out.

Aug 14 '24 10:08 friendly-bits

The only shortcoming of this method

Such a minor thing in the grand scheme of things.

Aug 14 '24 10:08 Wizballs

Just so I understand, valid old entries will be retained, but invalid old entries will be deleted and any needed replacements adds as defaults, right? So a custom config element without double quotes will get lost, but a custom config element with double quotes will be retained? I think that's probably just fine.

Oops, just noticed this comment.

valid old entries will be retained

Yes

invalid old entries will be deleted

The code makes a distinction between "invalid" and "unexpected" entries. Invalid = bad format (missing '=', missing key, the key has invalid characters, or the value is not enclosed in double-quotes while also including the # character). Unexpected = entry has a valid format but the key is not found in the new default config. When an invalid entry is encountered, load_config() throws an error and exits. When an unexpected key is encountered, removing it will be suggested.

any needed replacements adds as defaults, right?

The fix-related code essentially replaces default values with existing values extracted from entries which are both valid and expected in the existing config (including values extracted from entries in "legacy format", i.e. missing double-quotes). Missing entries are not replaced, i.e. they are copied over from the default config to the fixed config as-is.

a custom config element without double quotes will get lost

An entry which has its value not enclosed in double-quotes is considered "legacy format". Provided that the key is valid and expected (i.e. included in the new default config), the value is still retained when the config is fixed but if any legacy entries are detected, the config must be fixed before it can be used.

An exception to this are entries which both have their value not enclosed in double quotes, and the value contains the # character, because in this situation we don't have a reliable way to determine whether # is a beginning of a comment or a part of a string (i.e. URL). Such an entry would be considered invalid and it doesn't have an automatic fix, so if detected, load_config() will error out and exit.

An exception to the exception (I know, it's a bit confusing) is a special case of 2 entries which exist in our older default config, don't have double-quotes and have in-line comments. Specifically:

compress_blocklist=1 # enable (1) or disable (0) blocklist compression
initial_dnsmasq_restart=0 # enable (1) or disable (0) initial dnsmasq restart

The following code in parse_entry() specifically addresses these entries so they are flagged as legacy rather than as invalid:

# Following 'case' is a temporarily solution to allow easy config migration - remove a few months from now (Aug 2024)
case "${entry}" in
	"compress_blocklist="?" #"*|"initial_dnsmasq_restart="?" #"*)
		legacy_entries="${legacy_entries}${entry}"$'\n'; legacy_keys="${legacy_keys}${key} "
		test_keys="${test_keys%%"${key}|"*}${test_keys#*"${key}|"}"
		val=${val%% *}
		return 0
esac

Aug 14 '24 10:08 friendly-bits

^

parse_config(): set ${luci_conf_needs_fix}
parse_config(): fix a couple typos
parse_config(): remove ${legacy_keys) (unused variable)
parse_config(): catch another previously uncaught case of invalid entries
fix_config(): move config-fixing code to a separate function
load_config(): accept the '-f' argument to force fixing config (bypassing console dialog and /dev/tty check)
load_config(): call fix_config()

The main change here is exposing the list of required fixes to the luci RPC script via ${luci_conf_needs_fix} variable, and added functionality of forced config fix via a call to load_config() with the argument -f. Both are intended to allow the luci app to have a similar dialog with the user as currently implemented via console, and initiate the fixes. The idea is something like:

call parse_config() (needs path to the config file in 1st argument)
check return code - if it's 2 then fixes are needed
pull suggested fixes from the ${luci_conf_needs_fix} variable, report to the user
to initiate config fix, call load_config -f
check the return code of load_config() and report success of failure

Not sure if @rickparrish would want to use this but I thought it could be a good idea to make the functionality available.

Aug 14 '24 12:08 friendly-bits

Not sure if @rickparrish would want to use this but I thought it could be a good idea to make the functionality available.

Sounds like an excellent idea, I'll definitely make use of it.

The luci app does have logic for updating a config file while keeping settings (eg remapping min_blocklist_file_part_line_count to min_blocklist_part_line_count), but now that you have the logic in adblock-lean itself it's redundant to have it in the luci app. So removing it will streamline the code a bit, and minimize the amount of testing needed for future config changes.

Aug 14 '24 14:08 rickparrish

I now have a Reset button appearing when parse_status returns 1 that will call gen_config, and an Update button appearing when parse_status returns 2 that will call load_config "-f"

I tried several different config files, and most worked, but I did run into an issue with one. You should be able to replicate it by doing:

service adblock-lean gen_config
Edit the config file to remove the # config_format=v2 line
service adblock-lean start

This is the output I get:

root@OpenWrt:~/adblock-lean# service adblock-lean start

Warning: Config format version is unknown or invalid.

Perform following automatic changes?
1. Update config format version
y|n: y
Error: fix_config(): invalid argument.
/etc/rc.common: line 445: report_failure: not found
Fix your config file '/root/adblock-lean/config' or generate default config using 'service adblock-lean gen_config'.
Error: Failed to load config.

Aug 14 '24 16:08 rickparrish