dispersy icon indicating copy to clipboard operation
dispersy copied to clipboard

Dispersy walker has fewer candidates than expected (2)

Open boudewijn-tribler opened this issue 11 years ago • 7 comments

(This issue is duplicate of #38. This was closed after it fixed a bug causing the trackers to no longer respond.)

We would expect more candidates sooner. Especially the AllChannelCommunity takes much longer to obtain a good number of candidates than we would expect.

Strangely enough the walk success rate is relatively high (around 90%). This contradicts the lack of available candidates.

This behavior must be either solved or explained. Please investigate.

boudewijn-tribler avatar Jun 20 '13 14:06 boudewijn-tribler

The behavior of Tribler improved on my home Ubuntu box. But the search community is still not bootstrapping.

UDP connectable peers seem to run fine, see http://jenkins.tribler.org/jenkins/job/Test_tribler_devel/73/

When behind a NAT, a lot of walk messages get lost. Do the logs indicate an incoming message from the IPv4 addresses listed in screenshots below? These should be connectable: 80.101.15.232, but there are frequent "walk_fail" problems. This test was conducted between 12:00 and 13:00 on Saturday 22June.

tribler_6 2pre_no_searchcommunity_high_drop__walk_fail_to_our_own_server__anotheripv4address

And another IPv4 address (tethering with 3G) gives same problems.

tribler_6 2pre_no_searchcommunity_high_drop__walk_fail_to_our_own_server

synctext avatar Jun 22 '13 10:06 synctext

There are 81 entries between 12:24 and 13:00. With the exception of one entry, they are all for the BarterCommunity. Later in the log are entries for the other communities as well.

I have a natted virtualbox that seems to have similar low connectability issues. I'll continue to test from there.

boudewijn-tribler avatar Jun 24 '13 13:06 boudewijn-tribler

The screenshots above are repeated with current branch. Dispersy AllChannel community works. However, the Search community fails to work.

The "walk_fail" shows my computer cannot connect to: 130.161.211.199 asmat.das2.ewi.tudelft.nl. 130.161.211.245 kayapo.das2.ewi.tudelft.nl. 130.161.211.194 superpeer9.das2.ewi.tudelft.nl. 130.37.198.19 om.cs.vu.nl

With logging we can determine if either the tracker, the client or both are at fault.

synctext avatar Jun 26 '13 18:06 synctext

I found a possible explanation for this bug. The CommunityStatisctics class was using the yield_iter_categories which filtered out all introduced candidates. Pull request #77 seems to fix it. Since applying this change, I have never seen less than 17 candidates in the allchannel or searchcommunity. Usually, the both hover around the 20 mark.

The only the "timeout_adjustment" property in candidates.py seems to still influence the number of candidates reported. During the startup, I reguarly see a behavour similar to:

8 candidates 7 candidates 9 candidates

Which i feel is caused by this timeout_adjustment property.

NielsZeilemaker avatar Jul 01 '13 13:07 NielsZeilemaker

The new test_overlay.py script was (last week) still reporting drops in candidates back to as low as 4 at times, this was using the fixed community.dispersy_yield_candidates(), i.e. the one returning walk, stumble, and intro.

#77 does make the problem less 'severe' as the GUI will now include intro candidates in the count as well. But the problem isn't solved yet.

As for the timeout_adjustment property, this should cause a candidate that we walk towards to get category 'none' until the intro response is received. As this is not immediately clear, I suggest we define the exact behavior we want and clean this up with https://github.com/Tribler/dispersy/issues/68.

boudewijn-tribler avatar Jul 02 '13 07:07 boudewijn-tribler

Could you explain to me why a candidate for which we have just send an introduction-request to should not be in the walk category? For me, it makes sense to prevent it from being walked to again using the is_eligable_for_walk but removing it from the walk category does not.

NielsZeilemaker avatar Jul 02 '13 08:07 NielsZeilemaker

I agree with you, I'm guessing this was easier to implement at the time. As I said, we should properly define these cases, implement, and verify with unit tests.

boudewijn-tribler avatar Jul 02 '13 09:07 boudewijn-tribler