bootleg icon indicating copy to clipboard operation
bootleg copied to clipboard

Converting string that contains Japanese text breaks on Windows

Open keychera opened this issue 1 year ago • 0 comments

I'm using bootleg both via babashka v1.2.174 and running the jar file (java 11.0.13) on Windows 10 Home Single Language

here is a reproducible example via babashka

(ns user)

(require '[babashka.pods :as pods])
(pods/load-pod 'retrogradeorbit/bootleg "0.1.9")
(require '[pod.retrogradeorbit.bootleg.utils :as utils])
(require '[pod.retrogradeorbit.hickory.select :as s])

(let [jp-html "<div>読</div>"]
  (spit "test-jp-str.txt" jp-html)
  (spit "test-jp-converted.txt"
        (utils/convert-to jp-html :hickory)))

and I also tried running this with the jar via java -jar command, and stdout to a converted.txt

(let [jp-html "<div>読</div>"]
  (convert-to jp-html :hickory))

both breaks the string 読 but with different results

  • via babashka, it turns into {:type :element, :attrs nil, :tag :div, :content ["読"]}
  • via java -jar, it turns into {:type :element, :attrs nil, :tag :div, :content ["��"]} (not quite sure if the characters will be correctly shown here)

note: I tried this on MacOS and there is no conversion problem

keychera avatar Mar 11 '23 04:03 keychera