Fix inconsistency in how `__blst_platform_cap` is defined
In C it's a BSS symbol, and in assembly it's a common symbol:
$ nm -og libblst.a | grep __blst_platform_cap
libblst.a:assembly.o:0000000000000004 C __blst_platform_cap
libblst.a:server.o:0000000000000000 B __blst_platform_cap
This causes problems for some toolchains.
With this change, it becomes:
$ nm -og libblst.a | grep __blst_platform_cap
libblst.a:assembly.o:0000000000000004 C __blst_platform_cap
libblst.a:server.o:0000000000000004 C __blst_platform_cap
In C it's a BSS symbol, and in assembly it's a common symbol:
$ nm -og libblst.a | grep __blst_platform_cap libblst.a:assembly.o:0000000000000004 C __blst_platform_cap libblst.a:server.o:0000000000000000 B __blst_platform_capThis causes problems for some toolchains.
Would you care to provide an example? The thing is that I've used this approach for many years, and I do mean many, without encountering any problems. What I did have problem at some point was with not-explicitly-initialized symbols not being zero. On Solaris. Yes, it was in violation of the language standard and was eventually fixed, but it's an actual example in favour of the original approach. Either way, as an ultimate out-of-touch-with-general-public example I can confirm that it even works on x86_64 VMS. As intended that is.
For reference, "a common symbol in one object file may be merged with a defined or common symbol of the same name in another object file." This from GNU assembler manual. On additional note, .comm-s don't belong in any particular segment in the sense that in the case it's merged with a defined symbol, the said symbol can reside anywhere, even in the .text segment. [And the merge with the common symbol doesn't affect its placement].
it [not zeroing uninitialized values in .bss segment] was in violation of the language standard
On a second though[t] I probably can't say this. The language standard stipulates that using uninitialized value constitutes undefined behaviour, which doesn't actually translate to ".bss segment being zeroed." Rather contrary in fact, ~because~ moreover, in contemporary parlance undefined behaviour is commonly associated with infamous "nasal daemons." And we don't want that. Just in case, .bss is specified to be zeroed, just not in the C language standard.
Yes, I see your point. According to the standards, this should work as-is.
The example that motivated this is an error when linking the static library into a Haskell program using ghc, the Haskell compiler. It says,
GHC runtime linker: fatal error: I found a duplicate definition for symbol
__blst_platform_cap
whilst processing object file
/$HOME/Local/lib/libblst.a
The symbol was previously defined in
/$HOME/Local/lib/libblst.a(#2:server.o)
This could be caused by:
* Loading two different object files which export the same symbol
* Specifying the same object file twice on the GHCi command line
* An incorrect `package.conf' entry, causing some object to be
loaded twice.
ghc-iserv: Could not load Object Code /$HOME/Local/lib/libblst.a(#1:assembly.o).
and
$ nm /$HOME/Local/lib/libblst.a | grep __blst_platform_cap
0000000000000004 C __blst_platform_cap
0000000000000000 B __blst_platform_cap
I'm wondering why you're using .comm in the assembly files instead of a simple external reference? Is it so that assembly.o can be used from the library without having to pull in server.o as well? If so, perhaps a better design would be to allocate and initialize the space in assembly.o and reference it from server.o(using extern int __blst_platform_cap) since server.o can't be used without assembly.o.
BTW, we don't have the problem when linking against the dynamic library, because the visibility("hidden") attribute stops the symbol from being exported by the dynamic library.
For reference, here's a commit that uses .extern in the assembly files and keeps the definition in the C file. The symbols are then:
$ nm -og libblst.a | grep __blst_platform_cap
libblst.a:assembly.o: U __blst_platform_cap
libblst.a:server.o:0000000000000000 B __blst_platform_cap
The example that motivated this is an error when linking the static library into a Haskell program using
ghc, the Haskell compiler.
I can't reproduce the problem, not on Linux with ghc 9.6.6 [or 9.4.7]. For reference, the snippet is
import Foreign.C.Types
foreign import ccall "blst_pairing_sizeof" foo :: CLong
main = print foo
I'm wondering why you're using
.commin the assembly files instead of a simple external reference? Is it so thatassembly.ocan be used from the library without having to pull inserver.oas well?
Yes. The goal is to use arbitrary combination of individual assembly modules without triggering linking problems, with the worst happening is suboptimal performance.
The example that motivated this is an error when linking the static library into a Haskell program using
ghc, the Haskell compiler.I can't reproduce the problem, not on Linux with ghc 9.6.6 [or 9.4.7].
And why would it fail if it calls gcc to link the final binary. Though your message mentions "GHC runtime linker," which ought to be something else...
your message mentions "GHC runtime linker," which ought to be something else...
If it's something used by ghci, then mine says that static libraries are not supported at all and insists on shared one.
your message mentions "GHC runtime linker," which ought to be something else...
If it's something used by ghci, then mine says that static libraries are not supported at all and insists on shared one.
Though I can extract server.o and assembly.o and start gchi server.o assembly.o and it doesn't seem to be a problem [to :load snippet.hs and invoke main]...
I don't get a failure with your minimal example, but then it isn't using anything from assembly.o. I'm looking at how we use blst in our code and I'll try to come up with a minimal example that fails.
I don't get a failure with your minimal example, but then it isn't using anything from
assembly.o.
I've examined ~that~ the resulting binary to confirm that assembly.o was in fact linked in. And [why] wouldn't it, server.o is a "monolith" block and linker will resolve its references.
But what is your ghc version? And OS?
I'm using ghc 9.6.7 and NixOS, but my colleague who was originally having the problem is using Ubuntu. He's now using ghc 9.12.2 and the problem as gone away, but since we use ghc 9.6.7 for our production builds I was concerned.
However, since I can't reproduce it I think it must be something specific to the build setup he's using and isn't a concern for our production builds. I'm happy to close this PR if you'd like, although I still think the current way of defining __blst_platform_cap is odd and .extern would be better than .comm.
Thanks for taking the time to look into this.
I'm confused. Is assertion that 9.6.7 doesn't work on NixOS? Or is it that some earlier version, presumably pre-9.x, didn't work on Ubuntu? I write "presumably pre-9.x," because I had no problem with 9.4.7 either, and it's unlikely that NixOS is that different from Ubuntu... Well, unless it's using musl or something...
Sorry for not being clear. The only occurrence of the problem is when my colleague experienced it using 9.6.7 on Ubuntu. I haven't been able to reproduce the problem on NixOS, with any version of ghc, and he doesn't have the problem on Ubuntu when using ghc 9.12.2.
His request for help encouraged me to look at the blst sources. That's when I identified the cause of the inconsistent symbol definitions and felt that it should be changed, even though I wasn't experiencing the problem myself (because in my workflow I'm linking against the dynamic version of the library which doesn't export the symbol at all).
To summarize. Normal compilations with ghc are not affected, because it invokes gcc to do the linking, which is perfectly capable of handling .comm-s. Therefore we're talking about interactive non-production compilations. ghci versions shipped with Ubuntu and Debian refuse to load static libraries with "Loading static libraries is not supported in this configuration. Try using a dynamic library instead." Hence we're looking either at support for static libraries being introduced in 9.6.7 or custom builds deployed at your site. Either way, it, the support for static libraries in interactive mode, apparently was found deficient and fixed somewhere between 9.6.7 and 9.6.12. In other words the problem is with ghci. One can make case that blst could be less "radical," but not that there is something wrong with it.
That's a fair summary.
This isn't a problem for us now, in development or production, and it's unlikely that any build setup will experience this problem in the future. I'm therefore happy to close this. However, the changes will still be here if anybody ever needs them in the future.
Thanks for your responsiveness to and careful consideration of my concerns.