diffkemp icon indicating copy to clipboard operation
diffkemp copied to clipboard

Use bitcode files instead of textual in snapshots

Open evatrn opened this issue 6 months ago • 8 comments

Changes in the Python code enabling the use of LLVM IR bitcode files. This version:

  • uses LLVM command line utilities and modified regular expressions,
  • introduces has_definition,
  • fixes an ambiguous unit test (test_get_module_for_symbol_fail),
  • adapts the tests to bitcode files.

As a result, the snapshot size is reduced without a significant increase in the runtime. Resolves #352.

evatrn avatar Jun 17 '25 14:06 evatrn

@evatrn I think that the failing CI / build-and-test tests could be caused by a collision in CI caches -- in GitHub are currently cached kernels in .ll form, but the tests are currently expecting .bc format. You could go around it by changing the key, which is used for caching and retrieving kernels.

If you change in

https://github.com/diffkemp/diffkemp/blob/2d9792fdbc711f07c19b78096b3aa76f2fe143ef/.github/workflows/ci.yml#L64

and in

https://github.com/diffkemp/diffkemp/blob/2d9792fdbc711f07c19b78096b3aa76f2fe143ef/.github/workflows/ci.yml#L116

the key to something like

test-data-llvm${{ matrix.llvm }}-bc-${{ hashFiles('tests/regression/test_specs/*') }} (added bc to key) it could fix this -- different key for retrieving kernels before (before .bc format having key without bc) and now (in .bc format with added bc in cache key).

PLukas2018 avatar Aug 11 '25 08:08 PLukas2018

Thank you for the hints @PLukas2018. The tests pass locally, so the errors could have indeed been caused by caching. I will give it a try once I finish what I am currently working on.

evatrn avatar Aug 11 '25 13:08 evatrn

Also needs a rebase since cc_wrapper.py was moved to a different directory by #387.

viktormalik avatar Aug 15 '25 10:08 viktormalik

Also, do you plan to clean up the commits (so that the tests pass after each commit and there's no churn) or should I just squash everything (ok with me).

I think that squashing the commits into one would be better.

evatrn avatar Sep 02 '25 19:09 evatrn

GitHub says it cannot be merged for some reason. Perhaps one more rebase is needed?

viktormalik avatar Sep 11 '25 14:09 viktormalik

I have tried to run this PR using the bot, and it crashed. Firstly, I thought that it is a bug in the bot, but when I ran locally

bin/diffkemp compare snapshots/linux-4.18.0-80.el8/ snapshots/linux-4.18.0-147.el8/

it also crashed

__cpu_online_mask: unknown
__cpu_possible_mask: unknown
__cpu_present_mask: unknown
__fentry__: unknown
__get_user_2: unknown
__per_cpu_offset: unknown
__put_user_2: unknown
__put_user_4: unknown
__put_user_8: unknown
__uv_cpu_info: unknown
__uv_hub_info_list: unknown
__x86_indirect_thunk_r10: unknown
__x86_indirect_thunk_r11: unknown
__x86_indirect_thunk_r12: unknown
__x86_indirect_thunk_r13: unknown
__x86_indirect_thunk_r14: unknown
__x86_indirect_thunk_r15: unknown
__x86_indirect_thunk_r8: unknown
__x86_indirect_thunk_r9: unknown
__x86_indirect_thunk_rax: unknown
__x86_indirect_thunk_rbp: unknown
__x86_indirect_thunk_rbx: unknown
__x86_indirect_thunk_rcx: unknown
__x86_indirect_thunk_rdi: unknown
__x86_indirect_thunk_rdx: unknown
__x86_indirect_thunk_rsi: unknown
_ctype: unknown
Segmentation fault (core dumped)

So I would probably wait with the merger after investigation of the cause.

PLukas2018 avatar Sep 11 '25 19:09 PLukas2018

Can the commits still be squashed after a rebase? When I looked at the PR before @PLukas2018 requested changes, I saw "changes can be cleanly merged". I can pull new changes from master if that's what you mean but then I probably need to squash groups of commits in this branch manually. Concerning the segfault, I will have a look. I tested on smaller projects but I do not recall running into a similar issue. Edit: I did not see @PLukas2018's comment when I was writing this, I will check it. Thanks for looking into that.

evatrn avatar Sep 12 '25 07:09 evatrn

When I looked at the PR before @PLukas2018 requested changes, I saw "changes can be cleanly merged".

Yeah, we may have merged some PRs between you checking and me creating a comment.

I can pull new changes from master if that's what you mean, but then I probably need to squash groups of commits in this branch manually.

You will need to fetch changes from the master and then rebase on it git rebase master, your commits will remain intact, they will just be put on top of the commits from the master branch. During the rebase, you may need to solve merge conflicts. You can solve them by installing meld and running git mergetool --tool=meld or e.g., in VsCode.

Concerning the segfault, I will have a look. I tested on smaller projects but I do not recall running into a similar issue. Edit: I did not see @PLukas2018's comment when I was writing this, I will check it. Thanks for looking into that.

This is a smaller reproducer:

  1. content of old.c:
    struct foo {                                                                    
        int i;                                                                      
    };                                                                              
    int foo1() {                                                                    
        return 1;                                                                   
    } 
    
  2. content of list file:
    foo
    
  3. bin/diffkemp build old.c old-snap list
    
    output
    foo: old.bc
    
  4.  bin/diffkemp compare old-snap/ old-snap/
    
  5. output
    Segmentation fault (core dumped)
    

PLukas2018 avatar Sep 12 '25 07:09 PLukas2018