dogecoin icon indicating copy to clipboard operation
dogecoin copied to clipboard

[WIP] net: Implement ADDRv2 for TorV3 support

Open alamshafil opened this issue 3 years ago • 39 comments

This draft PR implements ADDRv2 based on BIP155 to add support for TorV3 since Tor V2 has been deprecated. Fix for #2866.

This PR contains the following attempted backports:

  • https://github.com/bitcoin/bitcoin/pull/19351
  • https://github.com/bitcoin/bitcoin/pull/19360
  • https://github.com/bitcoin/bitcoin/pull/19534
  • https://github.com/bitcoin/bitcoin/pull/19628
  • https://github.com/bitcoin/bitcoin/pull/19845
  • https://github.com/bitcoin/bitcoin/pull/19845
  • https://github.com/bitcoin/bitcoin/pull/19954

As of now, both net and net_base tests fail. I am suspected there may be a issue trying to serialize/unserialize V1+V2 addresses.

NOTE: This PR is a work-in-progress

alamshafil avatar Oct 19 '22 18:10 alamshafil

I have opened https://github.com/alamshafil/dogecoin/pull/2 to fix a sprintf issue in torcontrol.cpp that made it segfault

patricklodder avatar Oct 19 '22 19:10 patricklodder

I'm currently investigating the following segfault, it triggers when an addrdb exists and we read from it:

==1== Warning: set address range perms: large range [0x59c87040, 0xbd0917ce) (undefined)
==1== Invalid read of size 1
==1==    at 0x37F5DA: prevector (prevector.h:252)
==1==    by 0x37F5DA: CNetAddr (netaddress.h:117)
==1==    by 0x37F5DA: pair<const CNetAddr&, 0> (tuple:1823)
==1==    by 0x37F5DA: pair<const CNetAddr&> (tuple:1813)
==1==    by 0x37F5DA: construct<std::pair<const CNetAddr, int>, const std::piecewise_construct_t&, std::tuple<const CNetAddr&>, std::tuple<> > (new_allocator.h:162)
==1==    by 0x37F5DA: construct<std::pair<const CNetAddr, int>, const std::piecewise_construct_t&, std::tuple<const CNetAddr&>, std::tuple<> > (alloc_traits.h:516)
==1==    by 0x37F5DA: _M_construct_node<const std::piecewise_construct_t&, std::tuple<const CNetAddr&>, std::tuple<> > (stl_tree.h:595)
==1==    by 0x37F5DA: _M_create_node<const std::piecewise_construct_t&, std::tuple<const CNetAddr&>, std::tuple<> > (stl_tree.h:612)
==1==    by 0x37F5DA: std::_Rb_tree_iterator<std::pair<CNetAddr const, int> > std::_Rb_tree<CNetAddr, std::pair<CNetAddr const, int>, std::_Select1st<std::pair<CNetAddr const, int> >, std::less<CNetAddr>, std::allocator<std::pair<CNetAddr const, int> > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<CNetAddr const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<CNetAddr const, int> >, std::piecewise_construct_t const&, std::tuple<CNetAddr const&>&&, std::tuple<>&&) [clone .isra.0] (stl_tree.h:2431)
==1==    by 0x382017: operator[] (stl_map.h:501)
==1==    by 0x382017: void CAddrMan::Unserialize<CDataStream>(CDataStream&) (addrman.h:411)
==1==    by 0x37B602: Unserialize<CDataStream, CAddrMan> (serialize.h:551)
==1==    by 0x37B602: operator>><CAddrMan> (streams.h:409)
==1==    by 0x37B602: CAddrDB::Read(CAddrMan&, CDataStream&) (addrdb.cpp:209)
==1==    by 0x37B89E: CAddrDB::Read(CAddrMan&) (addrdb.cpp:194)
==1==    by 0x1F63FE: CConnman::Start(CScheduler&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, CConnman::Options) (net.cpp:2289)

patricklodder avatar Oct 19 '22 20:10 patricklodder

Since this will change the addrman format, we need to make it is compatible with v1.21 since this PR does not include ASMAP support. (Issue only applies if we are not including ASMAP support)

I am not fully sure if we need to separate the versions (1.14 non-ASMAP addrv2 and 1.21 ASMAP addrv2) but it may be good to separate them.

In v1.21 branch: https://github.com/dogecoin/dogecoin/blob/d5d056d009a8c527d1debc1090a427019ce46d91/src/addrman.h#L180-L186

To my understanding 1.14 is using version 1 (V1_DETERMINISTIC). We could reserve V2_BIP155 for 1.14.7 to have BIP155 support without ASMAP and V3_ASMAP in v1.21 codebase.

Example use for 1.14.7 [without ASMAP] and 1.21 [with ASMAP]:

 //! Serialization versions. 
 enum Format : uint8_t { 
     V0_HISTORICAL = 0,    //!< historic format, before commit e6b343d88 
     V1_DETERMINISTIC = 1, //!< Dogecoin 1.14.x format
     V2_BIP155 = 2,         //!< for files with addresses in BIP155 format 
     V3_ASMAP = 3,        //!< for files including asmap version and in BIP155 format 
 }; 

Any code using version checks would also need to be updated.

Would like thoughts on this matter.

alamshafil avatar Oct 26 '22 19:10 alamshafil

To my understanding 1.14 is using version 1 (V1_DETERMINISTIC). We could reserve V2_BIP155 for 1.14.7 to have BIP155 support without ASMAP and V3_ASMAP in v1.21 codebase.

Good catch!

IF we pull this off including skipping v2, then I think swapping the versions is the preferred way (least impact) but we'll have to make sure once we're closer. Worst case we can set another bit because this is not p2p related at all and just local serialization versioning - it won't touch other clients.

patricklodder avatar Oct 26 '22 23:10 patricklodder

Opened https://github.com/alamshafil/dogecoin/pull/3 to fix deserialization for at least aarch64 and x86_64 win/linux. 32-bit and MacOs builds still need love. This makes net_tests pass.

netbase_tests still needs a fix, but the problem is caused by picking too much with 02e152455e3. However, I'm glad you did because we should anyway fix that. I'm preparing a PR for that separately.

patricklodder avatar Oct 29 '22 21:10 patricklodder

Thanks, I have merged the changes.

Based on #3157, should I move src/util/string.h to src/utilstring.h and remove unused code? It could save from trouble when rebasing, etc.

alamshafil avatar Oct 29 '22 22:10 alamshafil

Based on #3157, should I move src/util/string.h to src/utilstring.h and remove unused code? It could save from trouble when rebasing, etc.

Yes, and you should be able to retain the NODISCARD macro when rebasing on top of that.

patricklodder avatar Oct 29 '22 22:10 patricklodder

I have pushed some new commits and rebased against latest 1.14.7-dev branch.

This commits adds addrv2 support to GetAddrBytes() net: Addr addrv2 support to GetAddrBytes()

The other commits refactor CNetAddr::ToStringIP().

For Restore compatibility with old CSubNet serialization, I was not sure if I did it correct.

READWRITE(FLATDATA(REF(MakeSpan(netmask).first(4))));

For reference here is the Bitcoin version:

READWRITE(MakeSpan(obj.netmask).first(4));

This test in netbase_tests is different from this PR and the one backported. This PR:

BOOST_CHECK(LookupSubNet(std::string("5wyqrzbvrdsumnok.onion", 22), ret));

#3157:

BOOST_CHECK(LookupSubNet(std::string("5wyqrzbvrdsumnok.onion", 22), ret));

I am assuming we have some change which is changing the behavior of this test.

There might a couple of more commit we would need to backport.

alamshafil avatar Nov 01 '22 00:11 alamshafil

While the test suite passes on my end, running Dogecoin Core testnet segfaults.

Backtrace via gdb:

thread 30 "dogecoin-msghan" received signal SIGSEGV, Segmentation fault.
#0  0x00007ffff56afb4e in free () from /usr/lib/libc.so.6
#1  0x00005555555d1956 in CNetAddr::~CNetAddr (this=0x7fff5ffde010, __in_chrg=<optimized out>) at netaddress.h:117
#2  CService::~CService (this=0x7fff5ffde010, __in_chrg=<optimized out>) at netaddress.h:519
#3  CAddress::~CAddress (this=0x7fff5ffde010, __in_chrg=<optimized out>) at protocol.h:299
#4  ProcessMessage (pfrom=<optimized out>, strCommand=..., vRecv=..., nTimeReceived=<optimized out>, chainparams=..., connman=..., interruptMsgProc=...) at net_processing.cpp:1554
#5  0x00005555557bde92 in ProcessMessages (pfrom=0x7fff58002320, connman=..., interruptMsgProc=std::atomic<bool> = { false }) at net_processing.cpp:2981
...

During compiling I did see some warnings regrading the prevector m_addr, not sure if this has any effect.

alamshafil avatar Nov 01 '22 00:11 alamshafil

@xanimo pointed me to the solution for the 32-bit (and Mac 64-bit) serialization last night in a chat and is working on a proposal for it. Because that enhances and replaces some of the uint serialization logic, this may have positive effects on the workarounds we're currently doing in this PR with FLATDATA() - so I'm inclined to wait for that work to be visible.

This test in netbase_tests is different from this PR and the one backported.

Interesting. The negation there basically means that valid TORv2 address doesn't resolve into a subnet anymore. I checked against bitcoin v0.21.2 and it is still there in the same form as I backported - do you have any idea which particular change caused that?

patricklodder avatar Nov 01 '22 12:11 patricklodder

Based on https://github.com/bitcoin/bitcoin/commit/102867c587f5f7954232fb8ed8e85cda78bb4d32#diff-1d804b435c22c29ffc21022ad3a88a8b7d3bd44ae451c3d50e4700344ffef60dR439-R440

Subnetting was only supported for IPv4 and IPv6; however, this was changed later in development.

    // We only do subnetting for IPv4 and IPv6
    BOOST_CHECK(!LookupSubNet(std::string("5wyqrzbvrdsumnok.onion", 22), ret));

This was changed in https://github.com/bitcoin/bitcoin/commit/94d335da7f8232bc653c9b08b0a33b517b4c98ad (net: allow CSubNet of non-IP networks - v0.21.1)

    BOOST_CHECK(LookupSubNet(std::string("5wyqrzbvrdsumnok.onion", 22), ret));

Looks like there are a couple of more addrv2/torv3 follow-up commits we can backport.

alamshafil avatar Nov 02 '22 01:11 alamshafil

With commit net: Correctly serialize CService the segfault is fixed on my end and I am able to connect to peers unlike before.

All tests still pass for me on Linux x86_64. (qa/rpc-tests still needs work)

alamshafil avatar Nov 05 '22 02:11 alamshafil

This looks interesting...

tor: Got service ID dqm6pgvm..., advertising service dqm6pgvm...onion:44556
AddLocal(dqm6pgvm...onion:44556,4)

alamshafil avatar Nov 05 '22 03:11 alamshafil

Not sure if this commit would have any effect on the compiling of failed targets: https://github.com/bitcoin/bitcoin/commit/172f5fa738d419efda99542e2ad2a0f4db5be580

Would we need to wrap with REF()?

s << REF(COMPACTSIZE(ADDR_IPV6_SIZE));

alamshafil avatar Nov 06 '22 22:11 alamshafil

Alright yall, good news! I've finally finished my 'control' branch with BIP155 implemented according to unit tests passing. I chose to omit the 2 QA tests from 353a3fd for now since they interact with classes/components from the upstream testing frameworks but I did include the changes to mininode.py.

I decided to go this route as I didn't work on porting 1.21-dev (potential control) so I had no prior knowledge of what was included or omitted and wanted to see how things got from a-z for myself.

With that said, my branch is obviously gigantic and a lot of things can purged/are extraneous but at least we have something as another potential reference outside of 1.21-dev. For instance I did some protocol hacks that I'm quite certain wouldn't be acceptable but decided to include and omit later rather than remove before you 2 could have a look and the pertinent sections missing from this PR branch.

Anyway have a look. Hopefully it helps finish up the last pieces here. I'll not be working as aggressively in porting, but I do intend to slim it down/chop it up as much as possible and many apologies for the delay! I severely underestimated the amount of work it'd take heh. <3

https://github.com/xanimo/dogecoin/actions/runs/3432986939/jobs/5723117936

xanimo avatar Nov 10 '22 01:11 xanimo

Also note that probably 2/3 of the commits in that branch are unnecessary and in the coming days I'll be researching diff between this branch and that one for net/serialization stuff only. This was a partial exercise in studying core and finding the missing pieces for this PR.

I do still intend on opening a PR on your branch (@alamshafil) with the serialization stuff but if either of you (@patricklodder, @alamshafil) or anyone else comes up with something sooner please go with that instead.

xanimo avatar Nov 10 '22 02:11 xanimo

The nodehandling.py test is the last remaining obstacle I think - everything else seems to pass.

patricklodder avatar Nov 10 '22 04:11 patricklodder

I'll have a PR for you @alamshafil in about an hour. Just need to clean up commit title and description assuming it passes all architectures on CI.

https://github.com/xanimo/dogecoin/tree/1.14.7-dev-torv3-3149

xanimo avatar Nov 10 '22 05:11 xanimo

As promised: https://github.com/xanimo/dogecoin/commit/1aa40211370fe669b4ab8f4ef8493f1e23c7bc36

No worries if you all decide to go another route. :+1:

xanimo avatar Nov 10 '22 06:11 xanimo

First of all, thanks @xanimo for your hard work on the reference branch and don't worry about the delay.

I feel like we should make serialization changes in a separate PR instead of this one.

Looks like we are able to get all check to pass with this current branch. I commited https://github.com/dogecoin/dogecoin/pull/3149/commits/6f9e013a7d57bd7418eb872626006ed20e1531cd to see if it would fix macOS + 32-bit target in which it did; however, I feel like we should backport the fomratters to new Wrapper template instead of mixing old and new code together. We can go further and remove the use of FLATDATA and REF as found in @xanimo branch.

It seems that this commit https://github.com/dogecoin/dogecoin/pull/3149/commits/a68ce3c97e80cdf576b505659b763a90a329f89b caused failures. Would like thoughts on how we should handle this. Should try to fix the commit or remove it all together?

alamshafil avatar Nov 10 '22 22:11 alamshafil

First of all, thanks @xanimo for your hard work on the reference branch and don't worry about the delay.

I feel like we should make serialization changes in a separate PR instead of this one.

Agreed, definitely reducing scope atm.

Looks like we are able to get all check to pass with this current branch. I commited 6f9e013 to see if it would fix macOS + 32-bit target in which it did; however, I feel like we should backport the fomratters to new Wrapper template instead of mixing old and new code together. We can go further and remove the use of FLATDATA and REF as found in @xanimo branch.

Also in agreement re old/new code but after a brief chat with @patricklodder it makes sense to introduce new code and delete old code later after it's been thoroughly tested so for the time being I think this is fine during a transition like this.

It seems that this commit a68ce3c caused failures. Would like thoughts on how we should handle this. Should try to fix the commit or remove it all together?

I'm actually right in the middle of attempting to triage that commit while reducing scope of my PR on your branch as I hadn't rebased to include your revert. With that in consideration I'd say if this is passing then that's great but I wonder what implications this would have moving forward in your opinion?

xanimo avatar Nov 10 '22 22:11 xanimo

Great work both; this means that for any existing tests, this works. ❤️

It seems that this commit a68ce3c97 caused failures. Would like thoughts on how we should handle this. Should try to fix the commit or remove it all together?

This commit fixes a backward compatible issue with persisted banman files from previous versions, so I'd say the answer should be to fix it if we can.

From where I'm sitting, the trick now is to:

  1. Make sure that no consensus code is touched by this. 1.14.7 already has a huge secp256k1 update in and I think that we should really be super cautious to not create unnecessary risk to trigger any edge cases that we're not thinking about right now.
  2. We need to make a decision about asmap functionality and versioning. I think as it stands, the easiest way forward for this is to do the version number swap both here and in 1.21: addrv2 becomes version 2, asmap becomes version 3 (which basically means we worry about compatibility in 1.21, but preferably sooner rather than later.)
  3. I think it makes sense to break this PR down into smaller chunks so that code review is easier. It will probably also be easier to clean each chunk up on its own.
  4. Test this PR on testnet/mainnet thoroughly. We can do this in parallel with item 2.

Thoughts?

patricklodder avatar Nov 11 '22 00:11 patricklodder

Replying to @xanimo

I'm actually right in the middle of attempting to triage that commit while reducing scope of my PR on your branch as I hadn't rebased to include your revert. With that in consideration I'd say if this is passing then that's great but I wonder what implications this would have moving forward in your opinion?

It would be best to have that commit included since it keeps backward compatibly with older versions. Since you have the serialization changes on your branch, could you try to backport this Bitcoin commit https://github.com/bitcoin/bitcoin/commit/883cea7dea3cedc9b45b6191f7d4e7be2d9a11ca and check if all tests pass?

Replying to @patricklodder

  1. We need to make a decision about asmap functionality and versioning. I think as it stands, the easiest way forward for this is to do the version number swap both here and in 1.21: addrv2 becomes version 2, asmap becomes version 3 (which basically means we worry about compatibility in 1.21, but preferably sooner rather than later.)

If we go ahead with the serialization changes, then we should open a separate PR for easier code review. We can optionally consider adding asmap functionality PR so we don't have to worry about 1.21 compatibility. What are the risks on adding asmap functionality?

My thoughts are to get serialization changes merged in, think about asmap, and then break up this PR for code review (in addition to compatibility testing).

alamshafil avatar Nov 11 '22 00:11 alamshafil

So far in my testing I was able to fully sync a fresh testnet node only using a Tor proxy. Addresses were saved and loaded successfully without any errors.

2022-11-11 00:21:22 Loaded 1195 addresses from peers.dat  47ms

alamshafil avatar Nov 11 '22 00:11 alamshafil

@alamshafil Absolutely I'll double check that as soon as I finish up triaging the diff between the PR I opened against this branch last night. Think I've finally figured it out but want to make sure before I speak on it hehe.

And @patricklodder I think breaking into chunks would certainly make things easier for others to review who may not have been keeping track of this. I already have the commits to introduce asmap functionality to 1.14.7 and am also curious as to what the potential risks would be?

xanimo avatar Nov 11 '22 00:11 xanimo

What are the risks on adding asmap functionality?

I already have the commits to introduce asmap functionality to 1.14.7 and am also curious as to what the potential risks would be?

Can't tell without a proposal, so please propose it then. I am currently focusing on this PR, and I think we can defer that decision until later in the process, as long as we make absolutely sure that we don't forget about it - this is simply the difficulty of having two targets.

serialization changes

If you want to make further changes to this other than the CSubnet change, please let me know when you're ready. When that's the case, I'll go run some series of git blame on this PR and see if I can come up with a workable grouping/sequence, and I'll do some caller tracking to identify any callers of changed code here inside consensus code for this PR at that time.

patricklodder avatar Nov 11 '22 00:11 patricklodder

As of now I don't have any plans changing serialization. I was going to wait if the serialization changes made by @xanimo are needed for https://github.com/bitcoin/bitcoin/commit/883cea7dea3cedc9b45b6191f7d4e7be2d9a11ca to work properly.

alamshafil avatar Nov 11 '22 00:11 alamshafil

Sounds good to me @patricklodder, I'll submit a PR soon.

And to you both I figured it out! 😅 https://github.com/xanimo/dogecoin/actions/runs/3441739083

On the fence on whether or not I should've omitted the streams_test.cpp but it's considerably slimmer. Please lmk if I should strike the change to streams_test.cpp as it should still work with FLATDATA.

Anyway I propose you drop the reverted commit from today @alamshafil as my branch is up to date with the HEAD from last night and consider the existing PR I've opened as it solves the issue we saw in nodehandling.py on i686-linux.

xanimo avatar Nov 11 '22 02:11 xanimo

@alamshafil I have omitted the changes to serialization and stream tests so they continue to use FLATDATA so with that the only change in my PR is removing FLATDATA from the READWRITE macros in netaddress.h's CSubnet SerializationOp. I've also updated the description and title to reflect these changes. Now just waiting on CI but note that it can't be merged until a68ce3c is dropped.

Outside of that and from a review perspective, this also needs a rebase against upstream/1.14.7-dev.

xanimo avatar Nov 11 '22 02:11 xanimo

Will removing the use of FLATDATA break backward-compatibly?

alamshafil avatar Nov 11 '22 03:11 alamshafil