genometools
genometools copied to clipboard
Please avoid bundling libraries
genometools bundles quite a number of libraries:
teythoon@europa /tmp/genometools-1.5.8 % ls -1 src/external bzip2-1.0.6/ cgilua-5.1.3/ expat-2.0.1/ lpeg-0.10.2/ lua-5.1.5/ luafilesystem-1.5.0/ md5-1.1.2/ samtools-0.1.18/ sqlite-3.8.7.1/ tre/ zlib-1.2.8/
This is generally frowned upon. See the Debian policy [0] or the Fedora equivalent [1]. The latter gives quite a number of reasons why this is considered bad practice. In short, it creates more work for both the genometools maintainers and all packagers.
0: https://www.debian.org/doc/debian-policy/ch-source.html#s-embeddedfiles 1: https://fedoraproject.org/wiki/Archive:Bundled_Library_Packaging_Draft#Why_no_Bundled_Libraries
I appreciate the argument from Debian and Fedora. However, our bundling strategy has served us very well in the past. Because we bundle everything except libcairo, users usually can compile GenomeTools out of the box with make cairo=no. Many users run on older systems which do not have the libraries installed or where the libraries are outdated, which would lead to problems. On university systems the users often do not have root access to install the libraries either.
If you want to, you can compile GenomeTools with linking against shared libraries by using useshared=yes. That's how our Debian packages are build to comply with the Debian policy (and I agree that it makes total sense in that case).
Bundling the necessary libraries also makes it possible for us to cross-compile GenomeTools for many platforms, because we simply cross-compile all dependencies with it.
So I don't see us dropping bundled libraries anytime soon. Sorry.
I completely agree with @gordon. This is a compromise we are making for our users' convenience. IMHO allowing people with no access to proper package management to build and run a version at all outweighs the indeed very legitimate disadvantages you mention. Please also notice that we usually do not introduce patches to the bundled code or require GenomeTools to be built using the code copies.
However, now that you mention it I (as the maintainer of the GenomeTools Debian package) will make sure that the embedded code copies disappear from Debian's source packages. The embedded source files are all DFSG free, so it's merely a question of cleanliness I guess. I will exclude src/external from the next upstream version onward.
Thanks for having such a watchful eye on the code though, much appreciated :)
On university systems the users often do not have root access to install the libraries either.
That's a red herring. You don't need root to do that. If you did, you couldn't install genometools either.
If you want to, you can compile GenomeTools with linking against shared libraries by using useshared=yes.
Interesting, however, that option is not documented.
IMHO allowing people with no access to proper package management to build and run a version at all outweighs the indeed very legitimate disadvantages you mention.
It also exposes them to the risk of running outdated versions of said libraries. In effect you shift the burden of making sure the libraries are up to date from distribution maintainers to the genometools maintainers. I feel that users should opt-in to that, not opt-out.
Please also notice that we usually do not introduce patches to the bundled code
A cursory look over the commits touching src/external says otherwise.
On university systems the users often do not have root access to install the libraries either.
That's a red herring. You don't need root to do that. If you did, you couldn't install genometools either.
Maybe the notion of absolutely needing root was misleading here. The reality is that many typical GenomeTools users work in environments where they are not their own system administrators. That means, to install new software, there are usually two options:
- get the institution's sysadmin to install the software using native package management, which might me difficult to motivate e.g. if one just wants to evaluate a new tool, or
- compile the code from source.
Of course the second option is always available as long as a compiler toolchain is available on the target system, which is a reasonable expectation IMHO.
However, we want to avoid having a user (who might well be struggling with make options to begin with!) to go through the pains of unpackaged software installation:
- locate and download all library dependencies (and their dependencies!) as source tarballs
- build all of them, potentially using different build systems, and
- make sure GenomeTools finds the new libraries' headers and object files, which are in unpredictable directories.
From my experience it would be a major obstacle for our user base if we required this kind of build process from source, and the support requests we've had over the years seem to confirm that building from source is usually one of the main sources of problems.
If you want to, you can compile GenomeTools with linking against shared libraries by using useshared=yes.
Interesting, however, that option is not documented.
Indeed it isn't. Thanks for pointing this out, will add a note to the INSTALL documentation.
IMHO allowing people with no access to proper package management to build and run a version at all outweighs the indeed very legitimate disadvantages you mention.
It also exposes them to the risk of running outdated versions of said libraries.
As I said, this is not the case when GenomeTools is part of a distribution. In this case it uses the shared libraries provided by the distribution, which can be updated separately.
In effect you shift the burden of making sure the libraries are up to date from distribution maintainers to the genometools maintainers.
See above. Do not get me wrong -- I understand your concerns and they bother me too. For Debian I went into quite a bit of additional work to allow the use of shared libraries, not only to comply with their policy but also to ensure the viability of GenomeTools in a more organized software environment such as Debian's. But scientific software also exists to serve a specific purpose, and this means one has to consider and address the user's computing reality by balancing ease of packaging versus ease of installation by end users.
I think that documenting that the option of using shared libraries exists is a step in the right direction. The point is that packagers are (at least expected to be) more knowledgeable about how dependencies should be organized, and it's reasonable to shift the burden of getting this right to them than to the end users.
I feel that users should opt-in to that, not opt-out.
Are you proposing to use a separate usebundle=yes option and making useshared=yes the default?
Please also notice that we usually do not introduce patches to the bundled code
A cursory look over the commits touching src/external says otherwise.
That's why I said 'usually', not 'absolutely' ;)
The patches to the lua source mostly deal with making the code compile with our default -Werror on newer compilers, IIRC. What I meant was that we don't add major API adjustments to the source, making it incompatible with unmodified upstream (and I've seen that case multiple times when packaging third party software). Replacing them with unmodified counterparts is still possible.
Are you proposing to use a separate usebundle=yes option and making useshared=yes the default?
Yes.
The patches to the lua source mostly deal with making the code compile with our default -Werror on newer compilers, IIRC.
So again this is creating more work for you that could have been better spent developing genometools or fixing the issues in the libraries upstream.
well, if anyone has the time and energy to prepare something like docker packages with genometools we could get rid of the external libraries. But as @satta said: we not only have to think about the work for us, but the user base. And I know how much I hate it to get dependencies right for some software I want to use but can't install directly because of no root - many of our users are not as experienced. Then again, those should maybe use just the binaries (possibly inside docker or something so it works everywere).
@satta: The Debian package uses the bundled libraries header files, I'm working on a patch to address this.
Excellent, thanks for your work on this :) Looking forward to your patch.
While working on that I realized that you don't only embed the libraries, but also build them using your own Makefile and just link them together. I'm somewhat surprised that this approach works on that scale.
Yep. That approach started out when substantially less external code was used and indeed grew over time.
It proves the capabilities of our powerful build system :smile:
As argued in https://github.com/genometools/genometools/pull/748#issuecomment-186992820 this allows to build statically linked binaries quite easily.
The classic paper Recursive Make Considered Harmful argues why it it is better to define all dependencies in a single makefile.
Err, no. Any decent build system allows the creation of statically linked binaries, no need to either bundle libraries nor, if you must do that, side-step the build system of said libraries.
That paper speaks about dependencies within a project. Also, it is about speed, and "classic" is a euphemism for "18 years old".
Hi, I am occasionally contributing packages into Gentoo Linux. I agree with others that you should not bundle copies of 3rd-party's code but cleanly reluy on that. Would you make the package cleaner it could be added to major distro's easily. Exactly due to the above issues it won't happen so quickly.
We (maintainers) need to tag packages with a LICENSE string and we are not going to pollute it with licenses of pango, samtools, sqlite, ... At least in my case they were linked in dynamically but still, our system-wide copies were well tested and often were carefully patched. We just cannot accept you code uses a local bundled copy.
I hit this thread because the *.so library does not have a version number. That blocks installation of this pakcage on Gentoo at the moment completely.
>>> Source compiled.
>>> Test phase [not enabled]: sci-biology/genometools-1.5.10
>>> Install genometools-1.5.10 into /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image/ category sci-biology
make -j2 DESTDIR=/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image/ install
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin || mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin
cp bin/gt /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin
strip /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin/gt
mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/share/genometools
cp -a gtdata /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/share/genometools/
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/core \
|| mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/core
cp src/core/*_api.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/core
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/extended \
|| mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/extended
cp src/extended/*_api.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/extended
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/annotationsketch \
|| mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/annotationsketch
cp src/annotationsketch/*_api.h \
/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/annotationsketch
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/ltr \
|| mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/ltr
cp src/ltr/*_api.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/ltr
cp obj/gt_config.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools
cp src/genometools.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib || mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib
cp lib/libgenometools.a /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib
cp lib/libgenometools.so /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib
[build config script install]
sed -e 's!@CC@!cc!' -e 's!@CFLAGS@!-O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize!' \
-e 's!@CPPFLAGS@!-I\\"/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include\\" -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DHAVE_MEMMOVE -D_LARGEFILE64_SOURCE=1 -DHAVE_HIDDEN -DLUA_DL_DLOPEN -DLUA_USE_MKSTEMP -DSQLITE_THREADSAFE=0 -DHAVE_SQLITE!' \
-e 's!@CXX@!c++!' -e 's!@CXXFLAGS@!-O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize!' \
-e 's!@LDFLAGS@!-L/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib -Wl,-O1 -Wl,--as-needed -L/usr/local/lib!' \
-e 's!@LIBS@! -lm -ldl -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lcairo -lpangocairo-1.0 -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lcairo -lpthread -ldl!' -e "s!@VERSION@!`cat VERSION`!" \
-e 's!@BUILDSTAMP@!"2018-04-12 09:34:33"!' \
-e 's!@SYSTEM@!Linux!' <src/genometools-config.in \
>/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin/genometools-config
chmod 755 /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin/genometools-config
>>> Completed installing genometools-1.5.10 into /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image/
* Final size of build directory: 252792 KiB (246.8 MiB)
* Final size of installed tree: 75092 KiB ( 73.3 MiB)
* QA Notice: The following shared libraries lack a SONAME
* /usr/lib/libgenometools.so
Files matching a file type that is not allowed:
usr/lib/libgenometools.so
* ERROR: sci-biology/genometools-1.5.10::science failed:
* multilib-strict check failed!
*
* Call stack:
* misc-functions.sh, line 603: Called install_qa_check
* misc-functions.sh, line 217: Called source 'install_symlink_html_docs'
* 80multilib-strict, line 46: Called multilib_strict_check
* 80multilib-strict, line 42: Called die
* The specific snippet of code:
* [[ ${abort} == yes ]] && die "multilib-strict check failed!"
*
* If you need support, post the output of `emerge --info '=sci-biology/genometools-1.5.10::science'`,
* the complete build log and the output of `emerge -pqv '=sci-biology/genometools-1.5.10::science'`.
* The complete build log is located at '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/temp/build.log'.
* The ebuild environment file is located at '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/temp/environment'.
* Working directory: '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image'
* S: '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/work/genometools-1.5.10'
!!! post install failed; exiting.
>>> Failed to emerge sci-biology/genometools-1.5.10, Log file:
>>> '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/temp/build.log'
Getting rid of src/external is not an option, that would complicate the build process for normal users. However, Debian has a GenomeTools package and they have the same requirements. @satta did a lot of work to make this work. @mmokrejs If you send a PR that helps with building GenomeTools as a Gentoo package without removing src/external or breaking the Debian package I'm happy to merge it.