libarchive icon indicating copy to clipboard operation
libarchive copied to clipboard

Hide internal symbols in the shared library

Open ppentchev opened this issue 8 years ago • 8 comments

Hi,

Okay, this one's a bit of a doozy. While trying to add the library symbols file to the Debian package of libarchive (a kind of external-to-the-library symbol versioning scheme - a file that lists each symbol and the library version that first started providing it in its current form), I noticed that the shared library of libarchive contains a lot of internal symbols - functions and variables used only by the libarchive routines themselves, not part of the library's public API. This is not really a bad thing in and of itself, but it does make it just a little bit harder to do all kinds of API compliance checks on the library, as these internal symbols may appear and disappear without affecting anything, but still be noted as changes in the ELF library's symbol table.

So... what do you think of this change - use GCC's (and Clang's) -fvisibility command-line option to hide all the symbols by default, and then explicitly mark the public API ones by using the "visibility(default)" function attribute? Yes, this touches many files, but the changes to the vast majority of them are pretty trivial - add __LA_DECL before the function definition; the main changes are in the configure script and in the libarchive/archive.h and libarchive/archive_entry.h files.

Of course, this brings up another question; now that the __LA_DECL definition is actually exposed to consumers of the header files, should it really be prefixed with a double underscore, or should it be renamed to something in the public namespace. When I helped with the same kind of change in libwebsockets last year, the new preprocessor definition was named LWS_VISIBLE, so if you'd like me to, I can rename __LA_DECL to LA_VISIBLE or LA_PUBLIC or something like that.

Of course, it's ultimately your call whether this change is useful or whether you'd like it done in a different way.

Thanks again for everything you do for libarchive and FreeBSD in general, and keep up the great work!

G'luck, Peter

ppentchev avatar Jul 07 '16 08:07 ppentchev

From experience, I find the interactions of -fvisibility=hidden to create more problems than it solves. As such, I'm strongly against this approach.

jsonn avatar Jul 07 '16 09:07 jsonn

Fair enough, upstream's call is upstream's call.

Just for information, could you let me know what problems have you seen with -fvisibility? I am genuinely curious, since I've tried this with a couple of libraries on a couple of different platforms and I haven't run into any issues... yet :)

G'luck, Peter

ppentchev avatar Jul 07 '16 09:07 ppentchev

Don't get me wrong, I don't have a problem with the reverse approach of tagging internal functions explicitly as hidden. Let's just say that Mozilla's wrapping logic has burned me often enough...

jsonn avatar Jul 07 '16 19:07 jsonn

@jsonn: Could you elaborate on how Mozilla's wrapping logic has burned you?

I've not used this particular feature before, so don't have any experience with the drawbacks, but it sounds like a great idea.

We've generally tried to be careful in libarchive to keep internal functions static wherever possible (this is one reason why a lot of small utility functions are duplicated in multiple files; I think the advantages of keeping those static outweigh the desire to reduce duplication). But there are a number of shared functions. I have occasionally audited the symbols to try to ensure that internal shared functions are consistently named with an __archive prefix, but I'm sure we can do better.

What I would like is for every function to be declared with one of the following:

  • static -- everything that doesn't have to be shared or exported
  • ARCHIVE_INTERNAL -- all internal-only shared functions
  • ARCHIVE_PUBLIC (aka _LA_DECL) -- all exported functions

With that in place, we could experiment with different definitions of ARCHIVE_INTERNAL and ARCHIVE_PUBLIC try using -fvisibility to see if we can find a good workable answer.

kientzle avatar Jul 17 '16 20:07 kientzle

Skimming Microsoft's docs for __declspec(dllexport), it looks like that can only be used for the function declaration, not the function definition. So we may need to separately provide

  • ARCHIVE_PUBLIC_DECL (current _LA_DECL) - declaration of a public exported function
  • ARCHIVE_PUBLIC - definition of a public exported function
  • ARCHIVE_INTERNAL_DECL - declaration of a shared internal function
  • ARCHIVE_INTERNAL - definition of a shared internal function

That would allow us to define ARCHIVE_PUBLIC_DECL on Windows to be __declspec(dllexport) while leaving ARCHIVE_PUBLIC empty on that platform.

kientzle avatar Jul 17 '16 20:07 kientzle

The only point in tagging everything would be a slight simplification for windows users. For everyone else, just tagging internal (non-static) functions with something like LA_HIDDEN is good enough. Keeping a list of known exported symbols would be good enough for a regression test to verify no intentional leaks happened.

The problem with the Mozilla approach is that it can change the behavior of system headers and/or 3rd party headers.

jsonn avatar Jul 17 '16 21:07 jsonn

Seeing as how #1751 (merged Jul 31, 2022) adds -fvisibility=hidden and apparently there are a bunch of internal tokens/pragmas present in the sources that control visibility of various symbols, is this here PR obsolete?

dmacks avatar Apr 19 '23 12:04 dmacks

@ppentchev can you confirm that #1751 addresses this?

emaste avatar Apr 02 '24 13:04 emaste