clj-async-profiler icon indicating copy to clipboard operation
clj-async-profiler copied to clipboard

Errors generating flamegraph using java-temurin-20.0.1+9 with 1.2.0

Open cleeland opened this issue 3 months ago • 10 comments

Consistently get this error after stopping profiling. I can use the brendan gregg's flamegraph perl script to generate usable (static) svg flamegraphs, so I have reasonable confidence that the data in the raw .txt file isn't garbage.

java.lang.IncompatibleClassChangeError: Method 'java.util.Comparator java.util.Map$Entry.comparingByKey()' must be InterfaceMethodref constant              
        at clj_async_profiler.post_processing$raw_profile__GT_compact_profile.invoke(post_processing.clj:72)                                                                                  
        at clj_async_profiler.post_processing$read_raw_profile_file_to_compact_profile.invoke(post_processing.clj:95)                                                                         
        at clj_async_profiler.core$generate_flamegraph.invoke(core.clj:286)                                                                                                                   
        at clj_async_profiler.core$stop.invoke(core.clj:347)                                                                                                                                  
        at clj_async_profiler.core$stop.invoke(core.clj:331)                                                                                                                                  
        ....the rest of the frames are app code that I can't disclose...

I'm currently trying to wade through the code to understand enough to either solve it or give better information.

cleeland avatar May 07 '24 01:05 cleeland

Hello Chris! Which Clojure version are you using, is this 1.9.0 by chance? Clojure 1.10 contains several important compatibility fixes to work on Java 9 and onward, so I suggest updating the project to the latest Clojure version if possible. https://github.com/clojure/clojure/blob/master/changes.md#1-compatibility-and-dependencies

alexander-yakushev avatar May 07 '24 05:05 alexander-yakushev

It may even be older—possibly 1.7?

There are various reasons for that, and it’s not easily changed unfortunately.

Chris Cleeland

On Tue, May 7, 2024 at 12:15 AM Oleksandr Yakushev @.***> wrote:

Hello Chris! Which Clojure version are you using, is this 1.9.0 by chance?

— Reply to this email directly, view it on GitHub https://github.com/clojure-goes-fast/clj-async-profiler/issues/34#issuecomment-2097461451, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQQRYG7N2VIOJMY7OFJ43ZBBPNHAVCNFSM6AAAAABHKB4FRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXGQ3DCNBVGE . You are receiving this because you authored the thread.Message ID: @.***>

cleeland avatar May 07 '24 05:05 cleeland

Huh, that's really old. Can you revert to Java 8 then?

alexander-yakushev avatar May 07 '24 05:05 alexander-yakushev

Possibly, though also not trivially.

Do you think that this is an issue that might be resolved if I were to build the async profiler and clj-async profiler packages in my own environment? Is it worth trying?

On Tue, May 7, 2024 at 12:38 AM Oleksandr Yakushev @.***> wrote:

Huh, that's really old. Can you revert to JDK8 then?

— Reply to this email directly, view it on GitHub https://github.com/clojure-goes-fast/clj-async-profiler/issues/34#issuecomment-2097486967, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQQRYKKRFYLERA7PF7E4LZBBSFBAVCNFSM6AAAAABHKB4FRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXGQ4DMOJWG4 . You are receiving this because you authored the thread.Message ID: @.***>

-- Chris Cleeland

cleeland avatar May 07 '24 14:05 cleeland

You can try to add this after you require clj-async-profiler.core:

(.bindRoot #'clj-async-profiler.post-processing/raw-profile->compact-profile
           (fn [^HashMap raw-profile, count-total-samples?]
             (let [frame->id-map (HashMap.)
                   frame->id (fn [frame]
                               (or (.get frame->id-map frame)
                                   (let [cnt (.size frame->id-map)]
                                     (.put frame->id-map frame cnt)
                                     cnt)))
                   last-stack (object-array [nil])
                   total-samples (long-array [0])
                   acc (java.util.ArrayList. (.size raw-profile))
                   ;; Quite unconventional way to iterate over the map, but we want to sort
                   ;; by the key without creating intermediate sequences.
                   _ (->> (.entrySet raw-profile)
                          (sort-by key)
                          (run! (fn [entry]
                                  (let [stack (->> (.getKey ^HashMap$Node entry)
                                                   efficient-split-by-semicolon
                                                   (mapv frame->id))
                                        value (.getValue ^HashMap$Node entry)
                                        same (count-same stack (aget last-stack 0))
                                        compact-stack (into [same] (drop same stack))]
                                    (.add acc [compact-stack value])
                                    (aset last-stack 0 stack)
                                    (when count-total-samples?
                                      (aset total-samples 0 (+ (aget total-samples 0) ^long value)))))))
                   id->frame-arr (object-array (.size frame->id-map))]
               (run! (fn [[k v]] (aset id->frame-arr v k)) frame->id-map)
               (cond-> {:stacks (vec acc)
                        :id->frame (vec id->frame-arr)}
                 count-total-samples? (with-meta {:total-samples (aget total-samples 0)})))))

This will replace the function that's failing by an implementation that doesn't rely on calling static methods.

alexander-yakushev avatar May 07 '24 14:05 alexander-yakushev

As a side note, it is a bit novel to me that a project targets the latest JDK version but not the latest version of Clojure (as JDK updates usually bring more breaking changes). I suggest getting them in sync because you'll definitely run into more similar issues.

alexander-yakushev avatar May 07 '24 14:05 alexander-yakushev

I would much prefer to be on a more modern version of clojure, but moving it past the current version introduces some breakage that requires more investment than is currently reasonable. This code is part of a legacy system which is scheduled for complete replacement, so there is limited value in going through what's necessary to move fully forward.

I'm giving the substitute function a try in the next hour.

On Tue, May 7, 2024 at 9:50 AM Oleksandr Yakushev @.***> wrote:

As a side note, it is a bit novel to me that somebody is willing to bump JDK versions but not the version of Clojure (as JDK updates usually bring more breaking changes). I suggest getting them in sync because you'll definitely run into more similar issues.

— Reply to this email directly, view it on GitHub https://github.com/clojure-goes-fast/clj-async-profiler/issues/34#issuecomment-2098590764, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQQRYHHUP3ONVGETEIUMTZBDS4RAVCNFSM6AAAAABHKB4FRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJYGU4TANZWGQ . You are receiving this because you authored the thread.Message ID: @.***>

-- Chris Cleeland

cleeland avatar May 07 '24 15:05 cleeland

The replacement function idea worked, though I had to also bring in a couple of other private functions from post-processing.

The version skews are regrettable, and in other circumstances I would push clojure version beyond where it is. The code is part of a legacy system that has a scheduled EOL, and moving beyond where the current version breaks things in ways that require work that has little ROI. The goal right now is to squeeze just a little more time from it.

At some point I'm going to try to understand why the preferred implementation breaks. My naive thinking is that since it's focused on java, then it should only be affected by the version of java, and not the version of clojure.

cleeland avatar May 07 '24 16:05 cleeland

The reason it breaks is this bug in Clojure: https://clojure.atlassian.net/browse/CLJ-2284. The bug manifests only in Java 9 and later, and was fixed in Clojure 1.10.

alexander-yakushev avatar May 07 '24 18:05 alexander-yakushev

It's interesting to hear that things break if you upgrade the version of Clojure. Clojure is generally very backward-compatible. But I believe that it may happen.

alexander-yakushev avatar May 07 '24 18:05 alexander-yakushev

The breakage isn't necessarily in clojure itself, but in library dependencies, cross-library dependencies, and underlying java-native dependencies (i.e., moving past this version requires bumping some dependencies, which domino-effects into others and which bumps other native java tooling that changes architecture/API sufficiently that it would require some redesign in the legacy code base). Moving forward requires bumping the version, then seeing what breaks, assess, then fixing.

I arrived at 1.7 by effectively doing a bisect and finding the sweet spot between what could be fixed and validated (via existing or simple additional unit tests) and what required more extensive development either of code itself or unit/system tests. The code isn't where I'd prefer it to be, but it is what it is.

cleeland avatar May 14 '24 16:05 cleeland

To close this out, the suggested substitution functions properly generate flamegraphs. I had to also copy over count-same and efficient-split-by-semicolon since those private functions are referenced by the substitution functions.

cleeland avatar May 14 '24 16:05 cleeland

For future, you can reference private functions by using #' syntax.

alexander-yakushev avatar May 14 '24 17:05 alexander-yakushev