charlock_holmes icon indicating copy to clipboard operation
charlock_holmes copied to clipboard

Fix native extension build on M1 macOS (Ruby 3+) by adjusting compiler flags

Open restot opened this issue 10 months ago • 5 comments

This change addresses a build failure on M1 MacBooks (macOS 15.3, build 24D60) when compiling the charlock_holmes gem with Ruby 3+. The problem was caused by an explicit hardcoded flag (-x c++) in the ext/charlock_holmes/extconf.rb file, which caused the C compiler to receive a stray "C++" argument.

# ext/charlock_holmes/extconf.rb 
compile_options = +"-x c++"

After some troubleshooting I've come up with fixes:

  • Force the use of /usr/bin/clang++ instead of the default g++ on Apple Silicon (arm64) with Ruby ≥ 3.0.0.
  • Append the ICU C++ standard flag to $CXXFLAGS (not $CPPFLAGS) to prevent C files from receiving C++ flags.
  • Strip stray -x c++ from $CPPFLAGS for affected systems.

Tested on macOS 15.3 M1 with Ruby 3.0.2 and 3.3.5 using a custom script for encoding detection/conversion.

A test script was used to verify core functionality (encoding detection and conversion) with different sample strings.

For example: For the short sample "Caf\xe9".force_encoding("ISO-8859-1"), the detector returns low confidence (15) and defaults to UTF-8 (this is expected with such a short and ambiguous string). For a longer sample such as "Caf\xe9 au lait, d\xe9j\xe0 vu, cr\xe8me brûl\xe9e, voilà, à la mode".force_encoding("ISO-8859-1"), the detector correctly identifies the encoding as ISO-8859-1 with higher confidence (41) and the conversion produces a reasonably correct UTF-8 string.

I've shared the full test script source and output details below.

Error log I had before fix:

% gem install charlock_holmes-0.7.9.gem -- \
  --with-icu-dir="$(brew --prefix icu4c)" \
  --with-cxx=/usr/bin/clang++ \
  --with-cxxflags="-std=c++17"
  
Building native extensions. This could take a while...
ERROR:  Error installing charlock_holmes-0.7.9.gem:
        ERROR: Failed to build gem native extension.
checking for -licui18n... yes
checking for unicode/ucnv.h... yes
checking for -lz... yes
checking for -licuuc... yes
checking for -licudata... yes
checking for icu that requires explicit C++ version flag... yes
checking for icu that compiles with c++20 standard... yes
Static linking is disabled.
creating Makefile
clang: error: no such file or directory: 'c++'
make: *** [converter.o] Error 1

Because of the hardcoded c++ flag in extconf.rb --with-cxx flags have been ignored. I tried maybe everything to help the compiler recognize the C++ compiler installed in the system but always got the same error.

Test script source:

#!/usr/bin/env ruby
require 'charlock_holmes'

puts "Testing charlock_holmes gem..."

# Sample string encoded in ISO-8859-1 (Latin-1)
sample_text = "Caf\xe9".force_encoding("ISO-8859-1")
puts "Original string: #{sample_text} (encoding: #{sample_text.encoding})"

# Detect the encoding
detection = CharlockHolmes::EncodingDetector.detect(sample_text)
if detection
  puts "Detected encoding: #{detection[:encoding]}"
  puts "Detection confidence: #{detection[:confidence]}"
else
  puts "Encoding detection failed."
  exit(1)
end

# Convert the string to UTF-8
begin
  converted = CharlockHolmes::Converter.convert(sample_text, detection[:encoding], "UTF-8")
  puts "Converted string: #{converted} (encoding: #{converted.encoding})"
rescue => e
  puts "Error during conversion: #{e}"
end

detection = CharlockHolmes::EncodingDetector.detect(sample_text)
if detection
  puts "Detected encoding: #{detection[:encoding]}"
  puts "Detection confidence: #{detection[:confidence]}"
else
  puts "No encoding detected."
end

sample_text = "Caf\xe9 au lait, d\xe9j\xe0 vu, cr\xe8me brûl\xe9e".force_encoding("ISO-8859-1")
puts "Original string: #{sample_text} (encoding: #{sample_text.encoding})"
# Convert the string to UTF-8 based on the detection.
begin
  converted = CharlockHolmes::Converter.convert(sample_text, detection[:encoding], "UTF-8")
  puts "Converted string: #{converted} (encoding: #{converted.encoding})"
rescue => e
  puts "Conversion error: #{e}"
end

# Create a longer, less ambiguous ISO-8859-1 sample
sample_text = "Caf\xe9 au lait, d\xe9j\xe0 vu, cr\xe8me brûl\xe9e, voilà, à la mode".force_encoding("ISO-8859-1")
puts "Original string: #{sample_text} (encoding: #{sample_text.encoding})"

detection = CharlockHolmes::EncodingDetector.detect(sample_text)
if detection
  puts "Detected encoding: #{detection[:encoding]}"
  puts "Detection confidence: #{detection[:confidence]}"
else
  puts "No encoding detected."
end

converted = CharlockHolmes::Converter.convert(sample_text, detection[:encoding], "UTF-8")
puts "Converted string: #{converted} (encoding: #{converted.encoding})"

Test script output:

Testing charlock_holmes gem...
Original string: Caf� (encoding: ISO-8859-1)
Detected encoding: UTF-8
Detection confidence: 15
Converted string: Caf� (encoding: UTF-8)
Detected encoding: UTF-8
Detection confidence: 15
Original string: Caf� au lait, d�j� vu, cr�me brûl�e (encoding: ISO-8859-1)
Converted string: Caf� au lait, d�j� vu, cr�me brûl�e (encoding: UTF-8)
Original string: Caf� au lait, d�j� vu, cr�me brûl�e, voilà, à la mode (encoding: ISO-8859-1)
Detected encoding: ISO-8859-1
Detection confidence: 41
Converted string: Café au lait, déjà vu, crème brûlée, voilà, à la mode (encoding: UTF-8)

Thank you for your attention!

restot avatar Feb 07 '25 23:02 restot

Please take a look: @tenderlove, @stanhu

restot avatar Feb 07 '25 23:02 restot

I wasn't able to reproduce the problem, but maybe CFLAGS would make more sense than CPPFLAGS?

diff --git a/ext/charlock_holmes/extconf.rb b/ext/charlock_holmes/extconf.rb
index 903c297..e33ff81 100644
--- a/ext/charlock_holmes/extconf.rb
+++ b/ext/charlock_holmes/extconf.rb
@@ -67,7 +67,7 @@ if icu_requires_version_flag
     checking_for("icu that compiles with #{std} standard") do
       flags = compile_options + " -std=#{std}"
       if try_compile(minimal_program, flags)
-        $CPPFLAGS << flags
+        $CFLAGS << flags
 
         true
       end

stanhu avatar Feb 10 '25 03:02 stanhu

@stanhu

I couldn't reproduce the problem, but maybe CFLAGS would make more sense than CPPFLAGS?

diff --git a/ext/charlock_holmes/extconf.rb b/ext/charlock_holmes/extconf.rb
index 903c297..e33ff81 100644
--- a/ext/charlock_holmes/extconf.rb
+++ b/ext/charlock_holmes/extconf.rb
@@ -67,7 +67,7 @@ if icu_requires_version_flag
     checking_for("ICU that compiles with #{std} standard") do
       flags = compile_options + " -std=#{std}"
       if try_compile(minimal_program, flags)
-        $CPPFLAGS << flags
+        $CFLAGS << flags
 
         true
       end

Thank you, yes $CFLAGS helped to get rid of defining $CXXFLAGS and $CPPFLAGS but haven't helped to solve the root cause for my case.

My initial problem was in this explicit flag https://github.com/brianmario/charlock_holmes/blob/1c14a3fd5b08f629bba7ba478f8644a5296932bc/ext/charlock_holmes/extconf.rb#L59 So I solved it with conditional application to preserve backward compatibility because I don't have enough competence to justify or disproof its necessity

Without an explicit "-x c++" flag for the compiler, I was able to compile source code.

restot avatar Feb 10 '25 04:02 restot

I'm not sue what's going on your system. I have hundreds of team members who have had to compile and install this gem on all sorts of macOS and clang versions. You might want to inspect the output Makefile to see how this stray argument is being used.

stanhu avatar Feb 10 '25 05:02 stanhu

  • Force the use of /usr/bin/clang++ instead of the default g++ on Apple Silicon (arm64) with Ruby ≥ 3.0.0

When I compile, it uses clang++. Do you know why your system is using g++? It's not default for me (and I'm on Apple Silicon). You should be able to check by doing:

ruby -r rbconfig -e'pp RbConfig::CONFIG["CXX"]'

Do we really need to add extra flags anyway? It looks like icu4c installs pkg-config settings, so we should just use the flags stored in pkg config rather than messing with them.

To answer my own question, ICU requires certain versions of C++. In order to convince the C or C++ compiler to use that version, we have to specify a -std=c++17 (or whatever it wants).

For some reason, pkg-config doesn't specify that flag:

$ pkg-config --cflags icu-i18n
-I/opt/homebrew/Cellar/icu4c@76/76.1_1/include

Strangely, the .pc file knows about the flag:

$ cat /opt/homebrew/opt/icu4c@76/lib/pkgconfig/icu-i18n.pc | grep FLAGS
# CFLAGS contains only anything end users should set
CFLAGS =  -std=c11
# CXXFLAGS contains only anything end users should set
CXXFLAGS =  -std=c++17
# DEFS only contains those UCONFIG_CPPFLAGS which are not auto-set by platform.h
#SHAREDLIBCFLAGS=-fPIC

But it doesn't add CFLAGS in the Cflags section of the .pc file:

$ cat /opt/homebrew/opt/icu4c@76/lib/pkgconfig/icu-i18n.pc | grep Cflags
Cflags: -I${includedir}

The Cflags line is the only way pkg-config knows what to return when you run the pkg-config command according to this.

It seems like whatever is generating the .pc file is not including the right stuff in the Cflags line. I'm not sure who generates that file though.

tenderlove avatar Feb 10 '25 17:02 tenderlove