PcapPlusPlus icon indicating copy to clipboard operation
PcapPlusPlus copied to clipboard

Check size of tcpFragmentList before delete fragment

Open MCredbear opened this issue 1 year ago • 15 comments

Fragment might be delete after calling m_OnMessageReadyCallback (eg. call TcpReassembly::closeConnection in m_OnMessageReadyCallback), so we need to check if it's already deleted .

MCredbear avatar Mar 21 '24 14:03 MCredbear

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 82.37%. Comparing base (e48be80) to head (2d1ac80).

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1346      +/-   ##
==========================================
+ Coverage   82.34%   82.37%   +0.03%     
==========================================
  Files         163      163              
  Lines       20937    20939       +2     
  Branches     7906     7910       +4     
==========================================
+ Hits        17241    17249       +8     
+ Misses       3025     3017       -8     
- Partials      671      673       +2     
Flag Coverage Δ
alpine317 72.39% <100.00%> (-0.03%) :arrow_down:
fedora37 72.40% <100.00%> (-0.03%) :arrow_down:
macos-12 61.41% <84.21%> (+<0.01%) :arrow_up:
macos-13 60.44% <84.21%> (-0.01%) :arrow_down:
macos-14 60.44% <84.21%> (-0.01%) :arrow_down:
macos-ventura 61.47% <84.21%> (-0.01%) :arrow_down:
mingw32 70.28% <100.00%> (-0.03%) :arrow_down:
mingw64 70.29% <100.00%> (-0.04%) :arrow_down:
npcap 83.36% <100.00%> (+<0.01%) :arrow_up:
rhel93 72.44% <100.00%> (-0.01%) :arrow_down:
ubuntu1804 74.75% <100.00%> (+0.01%) :arrow_up:
ubuntu2004 73.20% <100.00%> (-0.01%) :arrow_down:
ubuntu2204 72.24% <100.00%> (-0.01%) :arrow_down:
ubuntu2204-icpx 58.96% <84.21%> (-0.02%) :arrow_down:
unittest 82.37% <100.00%> (+0.03%) :arrow_up:
windows-2019 83.39% <100.00%> (+0.03%) :arrow_up:
windows-2022 83.41% <100.00%> (+0.03%) :arrow_up:
winpcap 83.37% <100.00%> (+0.05%) :arrow_up:
xdp 59.09% <0.00%> (-0.01%) :arrow_down:
zstd 73.84% <100.00%> (+0.05%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 22 '24 05:03 codecov[bot]

I add a new function to specially test that case, appologize for the ugly code

MCredbear avatar Mar 29 '24 04:03 MCredbear

图片 图片 Sorry, it occurs error again after I deploy my project in a 20Gb/s network..... Currently I'm still figuring what causes this.

MCredbear avatar Apr 08 '24 06:04 MCredbear

图片 图片 Sorry, it occurs error again after I deploy my project in a 20Gb/s network..... Currently I'm still figuring what causes this.

are you able to reproduce it? adding the root cause in the test is helpful for the edge cases.

tigercosmos avatar Apr 09 '24 00:04 tigercosmos

图片 图片 Sorry, it occurs error again after I deploy my project in a 20Gb/s network..... Currently I'm still figuring what causes this.

are you able to reproduce it? adding the root cause in the test is helpful for the edge cases.

Yes, and my new commit fixes it, thought I think this error shouldn't be caused...

MCredbear avatar Apr 11 '24 01:04 MCredbear

I am using TcpReassembly and it crashes with double free / invalid pointer in checkOutOfOrderFragments, but only on ARM, interestingly.

Looking at this, it seems this PR should fix crashes I see. I have a pcap with recorded traffic, but unfortunelly I cannot share it. But I could help you with testing.

davidbolvansky avatar Apr 11 '24 20:04 davidbolvansky

I am using TcpReassembly and it crashes with double free / invalid pointer in checkOutOfOrderFragments, but only on ARM, interestingly.

Looking at this, it seems this PR should fix crashes I see. I have a pcap with recorded traffic, but unfortunelly I cannot share it. But I could help you with testing.

That would be great, would you like to help test the PR with your pcap file on the ARM machine?

tigercosmos avatar Apr 12 '24 00:04 tigercosmos

Yes, the bug happens when you call closeConnection from OnMessageReadyCallback. Use build with address sanitizers (or valgrind?).

I tried you branch and I did not see crash

@seladb

davidbolvansky avatar Apr 13 '24 08:04 davidbolvansky

@davidbolvansky could you please furnish a concise reproducible code sample illustrating the issue you're encountering?

tigercosmos avatar Apr 13 '24 10:04 tigercosmos

As I said the bug happens when you call tcpReassembly.closeConnection(connData.flowKey) from OnMessageReadyCallback - OnMessageEndCallback is called for first time.

Then OnMessageEndCallback is called again, due normal fin. This second invocation triggers double free.

davidbolvansky avatar Apr 17 '24 20:04 davidbolvansky

@MCredbear can you please address the remaining comments so we can get the PR merged?

seladb avatar Apr 19 '24 07:04 seladb

@MCredbear please also address the previous comment: https://github.com/seladb/PcapPlusPlus/pull/1346#discussion_r1545563549 https://github.com/seladb/PcapPlusPlus/pull/1346#discussion_r1545563703 https://github.com/seladb/PcapPlusPlus/pull/1346#discussion_r1545564233 https://github.com/seladb/PcapPlusPlus/pull/1346#discussion_r1571927077

And fix the pre-commit issue: https://github.com/seladb/PcapPlusPlus/actions/runs/8772352536/job/24071180251?pr=1346

seladb avatar Apr 21 '24 21:04 seladb

@MCredbear the new test fails in CI 😕

seladb avatar Apr 26 '24 06:04 seladb

Emm, I don't know what causes the new CI test error

MCredbear avatar Apr 28 '24 10:04 MCredbear

@MCredbear In output side there are three different HTTP payloads with Content-Length 2248, 5261 and 6634. But for expected side there are four different HTTP payloads with Content-Length 2428, 5261, 6634 and 6649. So, it looks like the last request/reply pattern does not exist. If it is expected behavior, you should change the expected data.

egecetin avatar Apr 28 '24 12:04 egecetin

@MCredbear would you like to finish the PR?

tigercosmos avatar Jun 05 '24 06:06 tigercosmos

@MCredbear would you like to finish the PR?

Yes, sorry for the laggy reply, my computer's SSD broken last month, I'll commit later

MCredbear avatar Jun 06 '24 17:06 MCredbear

@seladb could you help me with the Tests? I'm confusing now....

MCredbear avatar Jun 14 '24 06:06 MCredbear

@seladb could you help me with the Tests? I'm confusing now....

@MCredbear I'm looking...

seladb avatar Jun 15 '24 10:06 seladb

@MCredbear I looked into why the tests don't pass and I'm working on a fix.

The main problem with calling closeConnection() inside onMessageReadyCallback is that it may cause a recursive call in checkOutOfOrderFragments(), which leads to issues. Here is a scenario describing how it happens:

  1. Assume we are in the middle of processing a TCP flow and there are some out-of-order fragments (packets) in the buffer
  2. checkOutOfOrderFragments() is called and manages to find a packet in the out-of-order buffer that matches the current sequence
  3. It calls the onMessageReadyCallback with that fragment (packet)
  4. The callback calls closeConnection()
  5. closeConnection() contains logic to call checkOutOfOrderFragments()
  6. checkOutOfOrderFragments() manages to find a packet in the out-of-order buffer that matches the current sequence
  7. It calls the onMessageReadyCallback
  8. The callback calls closeConnection() (again)
  9. closeConnection() calls checkOutOfOrderFragments() (again)
  10. And so on...

It's not an endless loop because eventually the packets in the out-of-order buffer will run out, but it creates issues when going back in the recursion. Also, checkOutOfOrderFragments() wasn't designed to run in a recursive manner.

The fix I'm working on is to prevent this recursion by adding a flag called m_ProcessingOutOfOrder that will be checked inside checkOutOfOrderFragments() and will return immediately if it's turned on.

seladb avatar Jun 17 '24 07:06 seladb

@MCredbear please take a look at the changes I made in this commit: https://github.com/seladb/PcapPlusPlus/pull/1346/commits/bdf16164177dfdff2f33c2350752dcaeba144b9e

All tests pass now.

seladb avatar Jun 17 '24 08:06 seladb

@MCredbear did you see my previous comments? Let me know if we can merge the PR

seladb avatar Jun 20 '24 08:06 seladb

Thank you @MCredbear for working on this, much appreciated! 🙏 ❤️

seladb avatar Jun 25 '24 21:06 seladb

Thank you @MCredbear for working on this, much appreciated! 🙏 ❤️

You are welcome, and sorry for the late reply, my school life is too busy this days that I forget to look at the PR

MCredbear avatar Jun 26 '24 10:06 MCredbear

Perfect, thanks.

Any plans for new release with this bugfix?

davidbolvansky avatar Jun 27 '24 16:06 davidbolvansky

@davidbolvansky Next release is aimed to be in August.

If you need the bugfix urgently, it is also part of the master branch now, you so could potentially build from that.

Dimi1010 avatar Jun 27 '24 17:06 Dimi1010