yara icon indicating copy to clipboard operation
yara copied to clipboard

linux.c: don't use pread64

Open orbea opened this issue 1 year ago • 13 comments

Starting with musl-1.2.4 all LFS64 APIs have been removed since musl is always 64-bit and yara now fails with implicit function declarations for pread64.

However configure.ac already sets AC_SYS_LARGEFILE so changing them all to pread will work with both glibc and musl.

orbea avatar Feb 26 '24 22:02 orbea

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Feb 26 '24 22:02 google-cla[bot]

This was explicitely changed in #2005. @vthib, could you take a look a this PR?

plusvic avatar Feb 26 '24 23:02 plusvic

Yes, as explained in #2005, this change would break the "glibc, x86, LFS not enabled" case.

There is a bit in musl explaining a bit the differences with glibc here: https://wiki.musl-libc.org/faq#Q:-Do-I-need-to-define-%3Ccode%3E_LARGEFILE64_SOURCE%3C/code%3E-to-get-64bit-%3Ccode%3Eoff_t%3C/code%3E?

From what I can gather, we have:

  • musl always defines off_t to 64 bits regardless of architecture, see https://git.musl-libc.org/cgit/musl/tree/include/alltypes.h.in#n29
  • glibc defines off_t as long int, which means 32 bits on x86.

The pread64 function is in posix for support of large files, but musl seems to want to deprecate this. This leaves us in a bit of an weird situation:

  • in the end, musl will only expose pread(..., off_t offset), with off_t=uint64_t. We could fix it right now by defining _LARGEFILE64_SOURCE but they do say they plan to remove this in the future
  • but on glibc x86, using pread(... off_t offset) will break process scanning unless -D_FILE_OFFSET_BITS=64 is set.

As detailed in #2005, the issue isn't so much when compiling yara using autotools (since it should set this properly), but rather when using other tools that compile yara, such as yara-rust. If it breaks, it is completely silent as it will still compile, and it can be very annoying to debug.

I wonder if the best solution couldn't be something like this:

  • Merge this commit, ie move back to using pread. This makes musl work.
  • Add a #define _FILE_OFFSET_BITS=64 in proc/linux.c before including unistd.h, to properly include the 64 bits version of those syscalls. This change avoids depending on an autotools option, makes the code easier to build and more local. This is basically what using pread64 was, but we can no longer do that due to the musl change.
  • Possibly add a static assert checking that sizeof(off_t) == 8, to make sure that it does not compile. This ensures we never get to the previous situation where it compiles properly, but it silently casts the value and breaks process scanning. If the second point is not done, that would allow yara-rust or others to no longer be able to compile yara unless it properly defines _FILE_OFFSET_BITS=64

vthib avatar Feb 27 '24 00:02 vthib

Maybe I'm missing something, but shouldn't AC_SYS_LARGEFILE already add the required defines to config.h?

orbea avatar Feb 27 '24 02:02 orbea

Maybe I'm missing something

Yes, it seems yara doesn't use AC_CONFIG_HEADERS so there is no config.h, but unfortunately adding it is not as trivial as I hoped due to how AC_DEFINE is used makes AC_CONFIG_HEADERS unhappy.

orbea avatar Feb 27 '24 02:02 orbea

I force pushed with two new commits, the first to add a description to AC_DEFINE as shown in the autoconf documentation which seems required with AC_CONFIG_HEADERS and another to add AC_CONFIG_HEADERSso that pread should work on 32-bit systems too since AC_SYS_LARGEFILE is already used.

This needs testing to be sure config.h is not missing anywhere important.

orbea avatar Feb 27 '24 03:02 orbea

Adding a config.h does not solve the points i raised about avoiding the use of the 32-bits pread function when autotools is not used.

@plusvic any opinions on the changes i suggested?

vthib avatar Feb 27 '24 18:02 vthib

Adding a config.h does not solve the points i raised about avoiding the use of the 32-bits pread function when autotools is not used.

Is it possible to test it? Check out the documentation for AC_SYS_LARGEFILE.

https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/System-Services.html

— Macro: AC_SYS_LARGEFILE

    Arrange for 64-bit file offsets, known as [large-file support](http://www.unix-systems.org/version2/whatsnew/lfs20mar.html). On some hosts, one must use special compiler options to build programs that can access large files. Append any such options to the output variable CC. Define _FILE_OFFSET_BITS and _LARGE_FILES if necessary.

    Large-file support can be disabled by configuring with the --disable-largefile option.

    If you use this macro, check that your program works even when off_t is wider than long int, since this is common when large-file support is enabled. For example, it is not correct to print an arbitrary off_t value X with printf ("%ld", (long int) X).

    The LFS introduced the fseeko and ftello functions to replace their C counterparts fseek and ftell that do not use off_t. Take care to use AC_FUNC_FSEEKO to make their prototypes available when using them and large-file support is enabled.

These defines get added to config.h which is now included at the start of the relevant file, I think this is accomplishing what you are suggesting?

orbea avatar Feb 27 '24 19:02 orbea

The issue that I raised is that if autotools are not used, it is very easy to forget to define _FILE_OFFSET_BITS, and this can lead to invalid process scanning.

The config.h is generated by autotools, so this actually reinforces the coupling with autotools. That would force people that do not use autotools (like yara-rust) to create a config.h by hand to still be able to build, and there is still the possibility of forgetting to define _FILE_OFFSET_BITS

vthib avatar Feb 27 '24 20:02 vthib

Thanks for elaborating, I understand your concern now and yes another build system would have to replicate the defines and autoconf tests. I am not certain if setting _FILE_OFFSET_BITS unconditionally is correct? See this meson issue. https://github.com/mesonbuild/meson/issues/3519

Unfortunately this is not something easily solved outside of using autoconf regardless...

orbea avatar Feb 27 '24 21:02 orbea

Regarding your link, it also leads to this: https://github.com/jarun/nnn/commit/db8b61866b1cc6dbbf05f83a7ca623f42eb9fcb7

And defining it in proc/linux.c would mean this would also impact linux builds, and no other targets. This feels very safe and recommended, both by musl and glibc

vthib avatar Feb 27 '24 23:02 vthib

I prefer not having to deal with a config.h file, and go with @vthib's proposal.

plusvic avatar Feb 28 '24 14:02 plusvic

I prefer not having to deal with a config.h file, and go with @vthib's proposal.

Its your call, I removed the config.h commit and went with @vthib's proposal. I left the AC_DEFINE commit since it might save some debugging in the future in case autoconf ever becomes more pedantic about that, but I can remove it too if you wish?

orbea avatar Feb 28 '24 15:02 orbea

Fixed in 833a580430abe0fbc9bc17a21fb95bf36dacf367

plusvic avatar Apr 07 '24 15:04 plusvic

Thank you! And sorry for forgetting about this PR, I had other things distracting me and lost track of it...

orbea avatar Apr 07 '24 16:04 orbea