bosh-linux-stemcell-builder icon indicating copy to clipboard operation
bosh-linux-stemcell-builder copied to clipboard

Noble: consider removing `clang`

Open aramprice opened this issue 11 months ago • 10 comments

Need to do some investigation as to whether the performance issues encountered in Jammy with gcc compiled ruby still exist on Noble.

If not, it seems better to drop clang from the stemcell to reduce surface area.

aramprice avatar Jan 17 '25 22:01 aramprice

Note that clang is one of the packages not in Ubuntu's "Main" repository

  • https://github.com/cloudfoundry/bosh-linux-stemcell-builder/issues/328

aramprice avatar Jan 17 '25 22:01 aramprice

The earlier issue with Ruby (3.3.x) and gcc compilation on Jammy is documented here:

  • "Increased memory usage (RSS) for Ruby when compiled by gcc" https://bugs.ruby-lang.org/issues/19052

The results of running the following script:

puts "RubyVM::YJIT.enable   => #{RubyVM.const_defined?('YJIT') ? RubyVM::YJIT.enable : 'YJIT Undefined' }"
puts "RubyVM::YJIT.enabled? => #{RubyVM.const_defined?('YJIT') ? RubyVM::YJIT.enabled? : 'YJIT Undefined' }"

def print_memory(header:)
  puts "--------------------------------------"
  puts header
  puts
  puts "statm resident Memory: #{Hash[%i{size resident shared trs lrs drs dt}.zip(open("/proc/#{Process.pid}/statm").read.split)][:resident].to_i * 4}"
  puts open("/proc/#{Process.pid}/smaps_rollup").read
  puts "--------------------------------------"
end

print_memory(header: 'Before thread')

threads = []

10.times do |i|
  threads << Thread.new do
  end
end

threads.each(&:join)

print_memory(header: 'After thread ended')

Here are the changes in "statm resident memory" before and after Thread creation across the following configurations using apt provided clang, gcc, and rustc:

  • With Ruby 3.4.1:
OS gcc - yjit gcc clang - yjit clang
jammy 12928 -> 13312 11776 -> 12032 n/a - compile fails 11136 -> 11392
delta 384 256 n/a 256
noble 13184 -> 13568 11520 -> 11904 12544 -> 12928 11136 -> 11392
delta 384 384 384 256
  • With Ruby 3.3.7:
OS gcc - yjit gcc clang - yjit clang
jammy 20352 -> 20608 19316 -> 19572 n/a - compile fails 10496 -> 10624
delta 256 256 n/a 128
noble 20224 -> 20352 19316 -> 19572 11776 -> 11904 10496 -> 10752
delta 128 256 128 256

jammy-gcc-3.4.1.log jammy-gcc-yjit-3.4.1.log jammy-clang-3.4.1.log noble-clang-3.4.1.log noble-clang-yjit-3.4.1.log noble-gcc-3.4.1.log noble-gcc-yjit-3.4.1.log

jammy-gcc-yjit-3.3.7.log jammy-gcc-3.3.7.log noble-gcc-3.3.7.log noble-gcc-yjit-3.3.7.log jammy-clang-3.3.7.log noble-clang-yjit-3.3.7.log noble-clang-3.3.7.log

aramprice avatar Jan 22 '25 23:01 aramprice

Test script used for the results above:

#!/usr/bin/env bash
set -euo pipefail

DOCKERFILE_CONTENT=$(cat <<'DOCKER_FILE'
ARG UBUNTU_RELEASE

FROM ubuntu:${UBUNTU_RELEASE}

ARG RUBY_SOURCE
ARG RUBY_VERSION
ARG RUBY_YJIT
ARG RUBY_INSTALL_VERSION
ARG THREAD_TEST_FILE

RUN apt update \
    && apt install -y \
        build-essential \
        clang \
        curl \
        libssl-dev \
    && . /etc/lsb-release \
    && if [ "${RUBY_YJIT}" = "yes" ] \
    ; then \
      apt install -y rustc \
    ; fi

RUN if [ "${RUBY_SOURCE}" != "apt" ] \
    ; then \
        export CC="${RUBY_SOURCE}" \
        && echo 'gem: --no-document' > /etc/gemrc \
        && echo 'gem: --no-document' > "${HOME}/.gemrc" \
        && NUM_CPUS=$(grep -c ^processor /proc/cpuinfo) \
        && curl -L "https://github.com/postmodern/ruby-install/archive/refs/tags/v${RUBY_INSTALL_VERSION}.tar.gz" \
          | tar -xz \
        && ( cd ruby-install-*/ && make -s install && cd - && rm -rf ruby-install-*/ ) \
        && ruby-install --jobs=${NUM_CPUS} --cleanup --system ruby ${RUBY_VERSION} \
          -- --disable-install-doc --disable-install-rdoc \
    ; else \
        apt install -y ruby \
        && dpkg -l ruby \
    ; fi


ENV RUBY_SOURCE=${RUBY_SOURCE}
COPY ./${THREAD_TEST_FILE} /thread_test.rb


CMD echo \
    && echo "######################### CONFIG #########################" \
    && echo "Ruby installed via: '${RUBY_SOURCE}'" \
    && echo \
    && echo "# cat /etc/lsb-release" \
    && cat /etc/lsb-release \
    && echo \
    && echo "# clang --version" \
    && clang --version \
    && echo \
    && echo "# rustc --version" \
    && rustc --version || echo "no rustc" \
    && echo \
    && echo "# gcc --version" \
    && gcc --version \
    && echo \
    && echo "# which ruby" \
    && which ruby \
    && echo \
    && echo "# ruby --version" \
    && ruby --version \
    && echo \
    && echo "# ldd $(which ruby)" \
    && ldd $(which ruby) \
    && echo \
    && echo "######################### CONFIG #########################" \
    && echo \
    && echo "########################## TEST ##########################" \
    && ruby thread_test.rb \
    && echo \
    && echo "########################## TEST ##########################" \
    && echo
DOCKER_FILE
)

THREAD_TEST_CONTENT=$(cat <<THREAD_TEST
puts "RubyVM::YJIT.enable   => #{RubyVM.const_defined?('YJIT') ? RubyVM::YJIT.enable : 'YJIT Undefined' }"
puts "RubyVM::YJIT.enabled? => #{RubyVM.const_defined?('YJIT') ? RubyVM::YJIT.enabled? : 'YJIT Undefined' }"

def print_memory(header:)
  puts "--------------------------------------"
  puts header
  puts
  puts "statm resident Memory: #{Hash[%i{size resident shared trs lrs drs dt}.zip(open("/proc/#{Process.pid}/statm").read.split)][:resident].to_i * 4}"
  puts open("/proc/#{Process.pid}/smaps_rollup").read
  puts "--------------------------------------"
end

print_memory(header: 'Before thread')

threads = []

10.times do |i|
  threads << Thread.new do
  end
end

threads.each(&:join)

print_memory(header: 'After thread ended')
THREAD_TEST
)

ubuntu_release="${1}"
ruby_source="${2}"

if [ "${ubuntu_release}" = "jammy" ] && [ "${ruby_source}" = "clang" ]; then
  yjit_default="no"
else
  yjit_default="yes"
fi
ruby_yjit="${3:-${yjit_default}}"

ruby_install_version="0.9.4"
ruby_version="${4:-"3.4.1"}"

build_tag="mem_test-${ubuntu_release}-${ruby_source}-yjit-${ruby_yjit}-${ruby_version}"

docker_file=".tmp-${build_tag}-${ruby_install_version}-${ruby_version}-Dockerfile"
thread_test_file=".tmp-${build_tag}-${ruby_install_version}-${ruby_version}-thread_test.rb"

echo "${DOCKERFILE_CONTENT}" > "${docker_file}"
echo "${THREAD_TEST_CONTENT}" > "${thread_test_file}"

set -x
docker build \
  --tag "${build_tag}" \
  --build-arg UBUNTU_RELEASE="${ubuntu_release}" \
  --build-arg RUBY_VERSION="${ruby_version}" \
  --build-arg RUBY_YJIT="${ruby_yjit}" \
  --build-arg RUBY_INSTALL_VERSION="${ruby_install_version}" \
  --build-arg RUBY_SOURCE="${ruby_source}" \
  --build-arg THREAD_TEST_FILE="${thread_test_file}" \
  --file "${docker_file}" .

docker run "${build_tag}"

Example command:

# noble, using clang, no yijit, ruby v3.3.7
./run.sh noble clang no 3.3.7

aramprice avatar Jan 22 '25 23:01 aramprice

Seems like the difference is inconsequential in both Jammy v. Noble, Ruby 3.3 v. 3.4, YJIT v not-YJIT, and clang v. gcc.

"Top Line[1]" memory impact:

  • Ruby 3.4 shows slightly higher memory consumption than Ruby 3.3
  • YJIT has slightly lower memory usage on Noble than non-YJIT
  • clang or gcc doesn't seem to impact memory usage

Perhaps there was a fix to gcc (at least the one on Jammy) in the past year, or there have been changes to the ruby patch versions.

[1] "top line" is literally the top line of the output from the ruby script above, originally from https://bugs.ruby-lang.org/issues/19052

aramprice avatar Jan 23 '25 00:01 aramprice

Should we remove clang from Noble then?

beyhan avatar Jan 23 '25 06:01 beyhan

After discussion at the FIWG meeting, it seems like a good idea to remove clang packages from Noble

jpalermo avatar Jan 23 '25 16:01 jpalermo

I've been doing some benchmarking and in my testing /v3/apps with 50 apps is reliably ~5% slower when ruby is compiled with gcc. I don't know if those results will hold across the average cloud controller requests or not. That seems right at the threshold between being a rounding error and being significant.

mkocher avatar Jan 28 '25 23:01 mkocher

Have we tested any template rendering with noble? We added clang because it reduce the time of template rendering by 10% which is alot when it was taking 2-3 hours for template rendering on high CF deployments

lnguyen avatar Jan 30 '25 12:01 lnguyen

@lnguyen - we haven't - do you know if there are test setup tooling laying around for hat?

As an aside @mkocher discovered that passing CFLAGS to Ruby's configure script caused the -O3 optimization flag for gcc to be dropped, and the bosh-package-ruby-release has been passing CFLAGS="-fPIC" for a long time, which might be the cause of slowdowns we've seen in the past.

aramprice avatar Jan 30 '25 17:01 aramprice

related: https://github.com/cloudfoundry/bosh-package-ruby-release/pull/41

ramonskie avatar Jun 24 '25 09:06 ramonskie