pacaur
pacaur copied to clipboard
Improve source code readability
First of all, I really appretiate your work in the development of this great piece of software. I've moved from yaourt
to pacaur
recently.
I've been digging in the script source code moved by curiosity. I got some experience in bash scripting but I found difficult to follow the application workflow. I'd like to contribute improving readability of the code if you are interested, this could ease maintenance and future contributions to the script functionality.
Basically, I'd focus on these points:
- Add more verbose comments to the code to clarify each step
- Use some comment banners to divide each section clearly
- Break bash lines to improve readability (limit to 80 char each line)
- Unroll some conditionals to improve readability
- Use some bashisms
I've found a short and interesting bash style guide if you want to take a look.
More references:
Hi, thanks for your interest!
You are right, there are several readability issues with the current code. It lacks comments, some parts need to be refactored or put in separate functions, and in the current state pacaur can only be maintained efficiently by myself. I would however not adhere to the 80 char per line rule, although I do agree that long lines are probably an indication that the code could be better decoupled in sub-functions.
There are also others issues I'd like to solve in a neat manner:
- some part of the code aren't robust enough (see #267 for example),
- error reporting is somewhat weak (#173),
- some part could take advantage of other external tools such as pacutils (#433) for better reliability,
- etc.
I've done some initial cleaning in the pacaur50
branch, and have some work in progress to get rid of the internal json parsing code and replace it by cower calls instead. Also, the idea to move the entire code to python came to my mind, but I'm not sure if I'd like to spend much time in a complete rewrite.
Feel free to fork and improve upon the actual code, or ask any questions about the code. An external pair of eyes is always welcome.
Thanks for you fast answer!
First, I'm going to start reading your code to add some commentaries. I'd rather understand well the code before refactor any piece of it. I'll do small pull request if you don't mind in order to keep everything tidy and manageable.
I'm messing around with Python these days in my job (matplotlib, numpy, pyqt5, ...) but I am not sure if it will bring any advantage to this application. I think it is really fast and reliable right now.
Additionally, you could break script into some subscripts, then you could source them from the main script (pacaur executable).
Edit: About 80 char lines it is only a suggestion, you can always use 120 char or another random number of characters. I think it is a good practise to limit lines, it could help you to notice when to create a new function or think in another solution.
I'm messing around with Python these days in my job (matplotlib, numpy, pyqt5, ...) but I am not sure if it will bring any advantage to this application. I think it is really fast and reliable right now.
I do very much Python at work and I am also not sure it is worth the effort. pacaur is "just" a nice wrapper for pacman which creates a unified and clean CLI for pacman, makepkg, cower and friends. I'm afraid rewriting pacaur in Python gains us nothing. :/
Thanks to both of you for your feedback. Yes, python might not substantially add gains, but there would be several issues it could tremendously help with:
- easier tracking of dependency structure for error reporting, where I hit a wall with the current bash design (#173)
- ability to use a better json parsing library and
pyalpm
directly, instead of C libraries (expac, cower) that break at each pacman major release. - testing suite, to avoid regressions,
- beautiful code, because python.
For the record, I initially hesitated between bash and python when starting pacaur a few years ago, and went with bash because you couldn't be reliable without a bash parser at the time. Since .SRCINFO have now been implemented, the AUR RPC allows to directly query dependencies instead of parsing PKGBUILD so python wouldn't be an issue anymore.
But right, moving to python would probably not be worth the time. It would still be a preferred choice if someone would start a similar wrapper today.
Lastly, keeping an eye on aurutils by @AladW is worth it. It's basically a modular helper that use a different approach (it uses a local repository instead of doing the build as you would do it manually, like pacaur does) with a bunch of nice ideas. Still experimental but overall much cleaner.
The more I look at the current code, the more I want to ditch it completely. Reading the bash style links above also give a few reasons why bash isn't the proper language for a robust and readable code. Data structure would be very welcome here, as pacaur currently relies on a different arrays to mimic it. Debugging would also be easier, and long term maintenance by someone else than myself would be less of an issue.
If the code should be reworked to make it easier to read and maintain, why not doing it in python?
Looking at the existing python code, I see that there are 3 different AUR rpc interface library available:
- python3-aur, which is up-to-date but somewhat bloated (using a SQLite db, really?). It's also developed behind closed doors which I dislike.
- pkgbuilder aur lib, which was developed specifically for the helper of the same name. It's very tied in to its code, and not usable as is.
- python-aur, which is not up-to-date (still using RPCv1) but seems to give returned data a nice structure, easy to use.
However, before any possible rewrite the question of the real need of a new pacaur version arises. I might as well ditch it for one of the competitors on the longer term - in some way the cost of maintaining a popular helper is really not worth the hassle.
Among the existing helpers, only 3 or 4 are actually reliable:
- pacaur (bash), which handles complex situations pretty well, and has all the downside already mentioned in this issue.
- bauerbill (python), which basically outputs bash scripts that will execute makepkg commands. It does use a fork of makepkg, which is something I simply dislike - although I understand the reasons behind that decision. The development is again here done behind closed door which I dislike.
- pkgbuilder (python), which has a "traditional" design somewhat similar to pacaur. It has a few downside regarding split packages but is otherwise quite good imho,
- aurutils (bash), which is still experimental but tries to use a different design, with a local repository. It's also modular, but it also becomes complex quite fast as the development progress.
In a nutshell, I'm still on the fence here. At the moment I'll still maintain and fix bugs in pacaur, but I'm not sure I really want to continue to work on it in the current state - especially if a better, cleaner design emerges in the near future (aurutils? a new pkgbuilder version?).
Reading the bash style links above also give a few reasons why bash isn't the proper language for a robust and readable code. Data structure would be very welcome here, as pacaur currently relies on a different arrays to mimic it.
It really depends on how you use the language - bash isn't very good with arrays, so focusing on that isn't good for effectiveness. Bash is however good at the stdin/stdout idea: piping programs and functions together, and using the filesystem to process data.
pkgbuilder (python), which has a "traditional" design somewhat similar to pacaur. It has a few downside regarding split packages but is otherwise quite well imho,
I agree, though my gripe with it is that it has no interactive inspection of packages.
aurutils (bash), which is still experimental but tries to use a different design, with a local repository. It's also modular, but it also becomes complex quite fast as the development progress.
Sometimes I get a bit too excited, but I've cut out experimental code and split up/refactored things further. Thanks for the reminder on keeping things simple. ;)
Yes, you are right. Pacaur has never been designed with the pipe idea in mind. This is another reason why aurutils is interesting. Also, keeping things simple on the long term is always difficult... so keep up the good work here!
The lack of interactive inspection of pkgbuilder is a design decision from the maintainer. I don't understand the rational behind that missing feature, but pkgbuilder is still one of the most interesting design out there. I guess I prefer it to the more "brute force" design of bauerbill that writes bash scripts each and every time.
Since a few days ago I am wondering if it would make sense to redesign pacaur as some wrapper to aur-utils? That would create the script more modular and readable and keep the well known CLI of pacman/pacaur.
Yes, this might be a possibility. The drawbacks I see are that
- aurutils is slightly more complex to use
- it currently lacks maturity
- I have no idea how easy or how difficult the "fast workflow" concept can be applied to it, and
- I personally have no motivation to do that myself. But if someone wants to do it reusing some of pacaur code, feel free to be my guest here.
I hope the maturity is somewhat accelerated by the absurd test-cases people have since thrown at aurutils, including packages which violate PKGBUILD(5)
(depends with spaces in them, breaking tsort) and the AUR guidelines (pkgname which equals an official pkgbase, breaking the split packages code). :grimacing:
Regarding the fast workflow, pacaur precalculates provides and presents the various choices at the beginning of the build. I don't know how difficult this would be to implement with a modular helper. Or you could keep using pacaur in parallel by setting the PKGDEST and PACMAN variables.
On the flip-side, you get all build files in one window (using PAGER, or vifm when installed), rather than prompting for each PKGBUILD/.install file as with pacaur.
Either way, if someone wanted to make an aurutils wrapper, perhaps it would be best to create a separate project for it and discuss it there.
What about splitting the script into subscripts in order to improve readability? Nothing complicated, just divide the script into several files and source them from the main script. For example:
- Variable configuration
- Functions (can be divided into several files by its functionality)
- Main script
Indeed, part of the file could be split in /usr/share/pacaur/
like makepkg does. But as I feel it, there is no reason to do that unless a serious overhaul of the code takes place too.
@AladW To be honest, I consider the precalculation of provides and conflicts the ugliest part of pacaur. It's slow, terribly inefficient, hard to maintain. It's the part that required the most work to mature too. I wish I could reuse some pacman code easily instead of duplicating my own soup here.
I think that dividing pacaur
into several subscripts is the proper way to start improving source code readability. Later, a deep overhaul of the code would be easier by people who wants to contribute. I just see this as the obvious starting point to improve the script in the future.
I think makepkg
is a nice example as you said. Code structure seems coherent and clean.
Here a nice code fragment of makepkg/util/pkgbuild.sh
:
get_pkgbuild_attribute() {
# $1: package name
# $2: attribute name
# $3: multivalued
# $4: name of output var
local pkgname=$1 attrname=$2 isarray=$3 outputvar=$4
printf -v "$outputvar" %s ''
if [[ $pkgname ]]; then
extract_global_variable "$attrname" "$isarray" "$outputvar"
extract_function_variable "package_$pkgname" "$attrname" "$isarray" "$outputvar"
else
extract_global_variable "$attrname" "$isarray" "$outputvar"
fi
}
##
# usage : get_full_version()
# return : full version spec, including epoch (if necessary), pkgver, pkgrel
##
get_full_version() {
if (( epoch > 0 )); then
printf "%s\n" "$epoch:$pkgver-$pkgrel"
else
printf "%s\n" "$pkgver-$pkgrel"
fi
}
Check commenting here. I think that specifying input parameters and the output of each function is a really nice idea.
I think that specifying input parameters and the output of each function is a really nice idea.
Definitely. And you're right, better start to split it the code right away. Looking at makepkg code, I'd suggest to start the following way:
/usr/bin/pacaur
:
- gettext initialization
- variables initialization
- subroutines sourcing (pacaur specific functions + reusing makepkg functions when possible)
- getopt parser
- usage()
- version()
- signal trap
/usr/share/pacaur/*.sh
- main functions
- utils functions
- dependency functions
- VSC handling functions
- cache handling functions
- etc.
This would occur in a new branch with the sole purpose of cleaning/splitting the code, while the current master branch is used as usual for maintenance.
Adjusting code style to makepkg style could be a good idea too.
Note: I have no time for this. If someone thinks he can kickstart this rewrite/code cleaning in a branch or fork, please do.
Just to note it, I had some difficulties to properly rebase my branches when the master code changes. I think that the code clean up and division into different files would be beneficial in this aspect to people who wants to contribute.
I want to start working on this issue soon.
I'd suggest to wait a little, as I still have some ongoing changes that might break stuff and make it hard to rebase. I'll review the -Si internationalization patch shortly, and squash your changes into fewer commits anyway.
Ok, I've just rebase it. I'll just add a last commit to initialize variables in the function as local variables and wait until you review it.
@ChuckDaniels87: FYI, with some delay, all the major changes are now included. I don't expect the code to change a lot before the next update in a couple of days.
I'm quite busy lately but I'll try to find some time to work on this issue as soon as possible.
By the way, could you create a new branch to do the PRs there?
Sure, here is the future pacaur 5.0.
I'm working on splitting the code directly in several scripts but I really like the makepkg
code structure:
libmakepkg
├── integrity
│ ├── generate_checksum.sh.in
│ ├── generate_signature.sh.in
│ ├── verify_checksum.sh.in
│ └── verify_signature.sh.in
├── integrity.sh.in
├── lint_package
│ ├── build_references.sh.in
│ ├── dotfiles.sh.in
│ ├── file_names.sh.in
│ └── missing_backup.sh.in
├── lint_package.sh.in
├── lint_pkgbuild
│ ├── arch.sh.in
│ ├── backup.sh.in
│ ├── changelog.sh.in
│ ├── epoch.sh.in
│ ├── install.sh.in
│ ├── optdepends.sh.in
│ ├── options.sh.in
│ ├── package_function.sh.in
│ ├── pkgbase.sh.in
│ ├── pkglist.sh.in
│ ├── pkgname.sh.in
│ ├── pkgrel.sh.in
│ ├── pkgver.sh.in
│ ├── provides.sh.in
│ ├── source.sh.in
│ ├── util.sh.in
│ └── variable.sh.in
├── lint_pkgbuild.sh.in
├── source
│ ├── bzr.sh.in
│ ├── file.sh.in
│ ├── git.sh.in
│ ├── hg.sh.in
│ ├── local.sh.in
│ └── svn.sh.in
├── source.sh.in
├── srcinfo.sh.in
├── tidy
│ ├── docs.sh.in
│ ├── emptydirs.sh.in
│ ├── libtool.sh.in
│ ├── purge.sh.in
│ ├── staticlibs.sh.in
│ ├── strip.sh.in
│ └── zipman.sh.in
├── tidy.sh.in
├── util
│ ├── message.sh.in
│ ├── option.sh.in
│ ├── parseopts.sh.in
│ ├── pkgbuild.sh.in
│ ├── source.sh.in
│ └── util.sh.in
└── util.sh.in
As a first approach, I am only creating a *.sh.in
on the first level and sourcing it from pacaur
script. But this approach can be easily expanded to the actual makepkg
atomistic structure (include-script/directory).
Those .in files are files preprocessed by make, which makes little sense here considering pacaur has no Makefile or generally macros to expand. Otherwise you should probably first ensure pacaur functions themselves are modular, i.e. work as filters, rather than simply split them to separate files to be all sourced by a main script which is a mostly superficial change (compare yaourt which has a similar layout, yet is the most unreadable thing since the invention of scripture). libmakepkg
didn't always get this right either, e.g. lint_package/build_references
which hardcodes variables only available in makepkg.
Another good way to increase readability is remove "home brew" code such as the color functions or more controversially the json parser; much of those are available in libmakepkg or other general-purpose tools. That might in fact be the best starting point for this whole operation.
@AladW I see this step as the most reasonable before really changing the actual code workflow. Making aggressive change will surely introduce bugs. This will allow us to work in the code with notably less conflicting commits since the files are separated. I agree with you that the next step is ensuring modularity of the functions.
Sure it will break things. Isn't that why it's put on a separate branch? ;)
I'm more comfortable working with small changes and testing for bugs than making very big changes try to figure out where the bugs are introduced. Anyways, you are right this is a testing branch, let's have some fun. 😄
Another good way to increase readability is remove "home brew" code such as the color functions or more controversially the json parser; much of those are available in libmakepkg or other general-purpose tools.
There is a ticket for the color functions and such (#422), so that's definitely something to go for. If the json parser get replaced by jshon, we might as well use others external tools to replace internal working (this includes pacutils, aurutils and maybe others). I've been skeptic about adding new dependencies (jshon), but if we go all-in for it and leverage tools that have been created the past few months/few years... why not.
I'd like to start documenting the functions using makepkg
style as mentioned before in this issue. This will help me to better understand the program workflow and maybe give us some insight of what can be replaced with external tools.