ndk icon indicating copy to clipboard operation
ndk copied to clipboard

HELP WANTED:Clang toolchain compile with -fopenmp cause so size increase nearly 400K

Open cocodark opened this issue 7 years ago • 22 comments

As descripted in title, the latest ndk r17 set clang as the default toolchain, the -fopenmp command cause shared library size increase as much as nearly 400Kb, but gcc in the previous version, the -fopenmp only cause increasing about 30Kb, what's the difference between gcc & clang? image

cocodark avatar Jul 14 '18 10:07 cocodark

@pirama-arumuga-nainar Any suggestions?

cocodark avatar Jul 17 '18 06:07 cocodark

I am able to reproduce this with a simple case.

int g[1024];

int main() {
  int i;
#pragma omp parallel
  for (i = 0; i < 1024; i ++)
    g[i] = i*i;
  return 0;
}

With OpenMP, executable from GCC is 40K, while the one from Clang is 380K. This seemed to have been worse with r16, where the Clang generated binary was ~800K.

There are far too many functions in the text section and symbol table with Clang than with GCC. I'll see why this is and if we can trim this down. If it helps in the meantime, adding '-Wl,--exclude-libs,libomp.a' reduces the binary to 350K. The savings are mostly from a smaller symbol table. The text section is still big.

pirama-arumuga-nainar avatar Jul 17 '18 19:07 pirama-arumuga-nainar

@pirama-arumuga-nainar As Gcc will be removed in r18, this bug cause a lot trouble to us

cocodark avatar Jul 18 '18 03:07 cocodark

Using -ffunction-sections when building the libomp runtime and passing -Wl,--gc-sections while linking (which, according to @DanAlbert, is passed by ndk-build and CMake by default) helps shave a further 100k from my test. With this change, the clang-build with openmp is ~250K, which is still higher than the 40K from gcc.

This is all I can think of from a black box perspective. I'll kick off an email to openmp-dev asking for their opinion.

I've attached libomp.a.zip for arm64 built with -ffunction-sections for experimentation. Just drop it into your NDK installation.

pirama-arumuga-nainar avatar Jul 18 '18 22:07 pirama-arumuga-nainar

https://android-review.googlesource.com/c/toolchain/llvm_android/+/719087 passes ffunction-sections when building the runtimes. This should be a part of r19 or whenever the NDK clang gets updated past the one in r18.

pirama-arumuga-nainar avatar Jul 18 '18 22:07 pirama-arumuga-nainar

@pirama-arumuga-nainar Is there any progress?

cocodark avatar Jul 27 '18 01:07 cocodark

@cocodark did this help?

I've attached libomp.a.zip for arm64 built with -ffunction-sections for experimentation. Just drop it into your NDK installation.

I have to ask upstream OpenMP developers about this - and for that I need to recreate my experiment with a newer gcc. I'll do that this week.

pirama-arumuga-nainar avatar Jul 30 '18 17:07 pirama-arumuga-nainar

@pirama-arumuga-nainar After replace the libomp.a, clang still cause increase ~250K ,this is still inexplicable, which force me to revert to R14 with GCC, so ,I hope it will be resolved before NDK R18 released.

cocodark avatar Jul 31 '18 03:07 cocodark

I hope it will be resolved before NDK R18 released.

If we find out that we're just building something wrong then that's possible, but if it's something that will require changes upstream then unfortunately that won't happen. It takes quite a bit of time to get changes made, merged to LLVM, pulled back to Android, tested, and then finally released.

Has anyone looked to see if the same size issues are present with prior NDKs? We've had openmp support for Clang for over a year but this bug was only opened about two weeks ago.

DanAlbert avatar Jul 31 '18 07:07 DanAlbert

As far as I know this has not been resolved upstream, so moving to r20.

DanAlbert avatar Oct 17 '18 20:10 DanAlbert

@DanAlbert Thank you

cocodark avatar Oct 22 '18 06:10 cocodark

Still no upstream progress.

DanAlbert avatar Feb 20 '19 07:02 DanAlbert

Still no changes upstream afaik.

I don't suppose the original problem was that the GCC build used for comparison was using a shared libomp whereas Clang was using a static one?

DanAlbert avatar Sep 04 '19 22:09 DanAlbert

@DanAlbert what's the upstream bug link?

nickdesaulniers avatar Sep 20 '19 17:09 nickdesaulniers

@pirama-arumuga-nainar might know. I've just been asking him iirc.

DanAlbert avatar Sep 20 '19 17:09 DanAlbert

There's no upstream bug. We'd need steps to reproduce before reporting in upstream - so if we can reproduce for any Linux target, that'd be best. It'd also help us understand if this is an issue with how we're building for Android.

pirama-arumuga-nainar avatar Sep 20 '19 17:09 pirama-arumuga-nainar

Isn't https://github.com/android/ndk/issues/742#issuecomment-405695546 a repro case? I'll check rq to see if this is still a problem on r21. I sort of wonder if the problem was actually just that gcc was using a shared openmp and we didn't have that for Clang until r21...

DanAlbert avatar Sep 20 '19 18:09 DanAlbert

(no, we never had a shared omp runtime for gcc)

DanAlbert avatar Sep 20 '19 18:09 DanAlbert

Huh? IIRC, when adding openmp runtimes for Clang, we added static openmp for compatibility with gcc.

pirama-arumuga-nainar avatar Sep 20 '19 18:09 pirama-arumuga-nainar

Sorry, I mean't we never had a shared omp runtime. Caffeine hasn't made it to my blood stream yet.

DanAlbert avatar Sep 20 '19 18:09 DanAlbert

I do still see a fairly significant increase in size when using openmp:

For the test case above: without -fopenmp: 8.0K with -fopenmp: 280K

Our openmp runtime doesn't seem to link properly on my debian machine. Pirama is looking...

DanAlbert avatar Sep 20 '19 18:09 DanAlbert

The 8K number was slightly misleading because the foo.cpp from earlier turned out to be a no-op for gcc/libgomp.

I looked at it briefly and filed an upstream bug. Here's the content from there for cross reference:

This was originally reported a while ago in the Android NDK bug tracker as https://github.com/android/ndk/issues/742.

Statically-linking libomp.a for a simple OpenMP hello world program, https://computing.llnl.gov/tutorials/openMP/samples/C/omp_hello.c, produces a 468K binary (after strip). gcc + libgomp.a, in comparison, produces a binary of size 128K.

I want a sanity check if this is expected behavior or if the runtime can be better organized for static linking.

Steps to reproduce:

Build OpenMP statically with: -DLIBOMP_ENABLE_SHARED=OFF -DCMAKE_BUILD_TYPE=Release


$ du -sh a.out
548K

$ strip a.out && du -sh a.out
468K
  • gold produces a 464K binary.

  • Adding -fvisibility=hidden -ffunction-sections -fdata-sections to C, CXX flags, and -Wl,--gc-sections, the unstripped binary is 2.2M but the stripped binary is 344K.

  • Bloaty [https://github.com/google/bloaty]:

  75.9%   414Ki  71.2%   338Ki    [1339 Others]
   5.5%  30.3Ki   6.4%  30.3Ki    [section .rodata]
   2.8%  15.1Ki   3.2%  15.0Ki    __kmp_aux_affinity_initialize()
   1.9%  10.6Ki   2.2%  10.6Ki    __kmp_stg_parse_affinity(char const*, char const*, void*)
   0.0%      39   2.0%  9.65Ki    __kmp_sighldrs
   1.5%  8.35Ki   1.8%  8.35Ki    [section .text]
   1.5%  8.03Ki   1.7%  7.99Ki    __kmp_fork_call
   1.4%  7.89Ki   1.6%  7.78Ki    __kmp_affinity_create_cpuinfo_map(AddrUnsPair**, int*, kmp_i18n_id*, _IO_FILE*)
   1.4%  7.68Ki   1.6%  7.58Ki    __kmp_affinity_create_x2apicid_map(AddrUnsPair**, kmp_i18n_id*)
   0.9%  5.18Ki   1.1%  5.12Ki    __kmp_stg_parse_omp_schedule(char const*, char const*, void*)
   0.9%  4.97Ki   1.0%  4.93Ki    __kmp_allocate_team
   0.8%  4.52Ki   0.9%  4.43Ki    __kmp_affinity_create_apicid_map(AddrUnsPair**, kmp_i18n_id*)
   0.8%  4.36Ki   0.0%       0    [Unmapped]
   0.0%      52   0.8%  4.00Ki    __kmp_threadprivate_d_table
   0.7%  3.98Ki   0.8%  3.82Ki    void __kmp_dispatch_init_algorithm<long long>(ident*, int, dispatch_private_info_template<long long>*, sched_type, long long, long long, traits_t<long long>::signed_t, traits_t<long long>::signed_t, long long, long long)
   0.7%  3.96Ki   0.8%  3.92Ki    __kmp_balanced_affinity
   0.7%  3.77Ki   0.8%  3.71Ki    __kmp_stg_parse_schedule(char const*, char const*, void*)
   0.7%  3.56Ki   0.0%       0    [section .symtab]
   0.6%  3.42Ki   0.7%  3.37Ki    __kmp_aux_capture_affinity
   0.6%  3.23Ki   0.7%  3.16Ki    __kmp_partition_places(kmp_team*, int)
   0.6%  3.06Ki   0.6%  2.89Ki    int __kmp_dispatch_next_algorithm<unsigned long long>(int, dispatch_private_info_template<unsigned long long>*, dispatch_shared_info_template<unsigned long long> volatile*, int*, unsigned long long*, unsigned long long*, traits_t<unsigned long long>::signed_t*, unsigned long long, unsigned long long)
 100.0%   546Ki 100.0%   474Ki    TOTAL

pirama-arumuga-nainar avatar Sep 23 '19 22:09 pirama-arumuga-nainar