storm icon indicating copy to clipboard operation
storm copied to clipboard

Replace jar distribution strategy with bittorrent

Open ptgoetz opened this issue 11 years ago • 40 comments

Addresses: https://github.com/nathanmarz/storm/issues/435

This just uses bittorrent for moving the topology jar files around, the serialized conf and topology still go through the thrift API.

When nimbus starts up, it will create a tracker on port 6969. When a topology is deployed, nimbus will create a torrent file, announce it to the tracker, and start seeding (it will seed until the topology is killed). Supervisors will grab the .torrent and use it to download the topology jar. Supervisors will share pieces during the download, but will not continue to seed once the download completes.

I'm open to any change in this behavior and/or class naming, etc.

I didn't update the logback config, but the ttorrent client is pretty chatty at the INFO level. It might be a good idea to turn that off.

The bittorrent client (https://github.com/turn/ttorrent) is not available in a public maven repo, as far as I can tell, but it is simple to build.

ptgoetz avatar Jul 17 '13 21:07 ptgoetz

This overlaps some with the HA Nimbus work, but the two can work together.

For example, torrent files can list multiple trackers (I.e. nimbus). I'll take a closer look at the ha stuff when I'm not on a phone. ;)

ptgoetz avatar Jul 17 '13 23:07 ptgoetz

We should have all topology files (conf etc.) managed by this system. This will make HA Nimbus much easier and remove the need for the storage abstraction and dependencies on external systems like HDFS. We can set a policy that at least N Nimbus's need to have the full topology files before the topology gets launched (but files will eventually be copied to all Nimbus's if more than N).

Is it possible to set max download and upload throughputs on each node? We would want to set these limits to prevent supervisors from taking on too much load from this.

As for the Bittorrent library, I can just upload that to the same Maven repo used for other Storm jars once this pull request is done.

nathanmarz avatar Jul 18 '13 01:07 nathanmarz

Then the way to go is to package everything in one multi-file torrent (topo.jar, topo.ser, topo-conf.ser), and distribute that.. That's a relatively easy change... Do you want me to add that?

As for throughput, the way I have it now is nimbus will take the most load, depending on the size of the .jar. If it is massive, then there will be more sharing between supervisor clients. Otherwise, the supervisors may get the parts from nimbus before they discover the other supervisor seeds.

I'll also look into what throttling options are available, with the ttorrent lib.

ptgoetz avatar Jul 18 '13 02:07 ptgoetz

OK. The throttling is really important, so I won't merge this in until that's in place. We can make the throttle amounts configurable via the storm.yaml.

nathanmarz avatar Jul 18 '13 02:07 nathanmarz

I delved into the tttorrent code and found that it does not support throttling dl/ul rates, but I will look into what it would take to add it.

Can you elaborate on your concern with the supervisors taking on too much load? In the current approach, only nimbus acts as a seeder. The supervisors act as leechers and stop sharing as soon as they complete the download. The supervisors never seed. For the most part, the supervisors will prefer downloading parts from nimbus (since nimbus has the complete file), and only share amongst themselves while downloading if a supervisor peer could offer a part with better performance than the nimbus peer.

So the supervisors are "download heavy" and the nimbus's are "upload heavy".

I'll move forward with including all topology files as part of the torrent download. Let me know if you still think lack of throttling is a showstopper, and I'll look into adding/requesting that feature for ttorrent.

ptgoetz avatar Jul 18 '13 17:07 ptgoetz

I love the idea of using bit-torrent, but bit-torrent does not support authentication or authorization, except through extensions like http://www.rasterbar.com/products/libtorrent/auth.html that ttorrent does not support. This is a bit of a regression as the current thrift APIs do have the beginnings of auth. Seeing how this code does not yet remove the thrift upload/download APIs would it be possible to have the distribution mechanism configurable until we can work out the auth model?

revans2 avatar Jul 18 '13 18:07 revans2

This is a case where making it pluggable increases complexity. The Bittorrent based distribution greatly simplifies the construction of HA Nimbus (completely removes reliance on external systems), and it is clearly the optimal approach for distributing static files around the cluster. So I'd like us to figure out how to support our auth needs (or at least have a plan for it) within the context of a Bittorrent based approach. Perhaps this will require modifications to ttorrent, and that's fine.

@ptgoetz So the reason for the importance of throttling is it provides a guarantee of load on a supervisor. First of all, I think supervisors should be allowed to seed files and participate in sharing after they've finished downloading. This ensures jar distribution is fast and scalable to large clusters. But we need to make sure that the distribution process can only have a limited effect on active topologies – hence the throttling. Resource usage should always be explicit and not rely on implicit effects of the design. Things change, and my experience has repeatedly shown that if you want a guarantee, you must be explicit about it.

nathanmarz avatar Jul 18 '13 19:07 nathanmarz

I understand the desire to not clutter things up too much with plug-ability. I am fine with just having a plan for auth initially. I am also happy to help implement it. I just want to be sure that it is not missed.

revans2 avatar Jul 18 '13 20:07 revans2

Absolutely, you brought up a great point regarding auth.

nathanmarz avatar Jul 18 '13 22:07 nathanmarz

Ok. I will update the code so all the topology files are distributed via BitTorrent.

Are you willing to accept BitTorrent throttling (and auth) as separate pull requests? For those I will need to fork the ttorrent project and make the necessary modifications, which may take a while.

IMHO, the switch to BitTorrent is a good first step toward HA, and the choking mechanism in the BitTorrent protocol will help alleviate overload on supervisors. When the BitTorrent feature is in place, the planned HA features can move forward. When throttling is available, it would be a simple addition (I could even put the hooks in place now).

I can also add a configurable seed duration for supervisors (e.g. Don't seed, seed indefinitely, or seed for X seconds).

ptgoetz avatar Jul 18 '13 23:07 ptgoetz

Throttling will have to be part of this pull request, auth can be in a separate pull request.

nathanmarz avatar Jul 19 '13 01:07 nathanmarz

I dove into the ttorrent code last night and have a crude, but functional means of throttling throughput. Now I have a question and a caveat.

The question is, do you want to throttle at the peer level, or the torrent level? I'm assuming the torrent level so I'm headed in that direction.

The caveat is that the throttling is not exact, nor is the logging/reporting of throuhput. So setting max ul/dl thresholds is more like a "hint." For example if I set a threshold of 50 kb/sec., the actual throughput will fluctuate between ~45-55 kb/sec. but average very close to 50. The ttorrent logs will also report inaccurate throughputs. But testing with several bittorrent clients, throughput was very close to the requested threshold.

Let me know if that's okay and I'll proceed.

Thanks, Taylor

ptgoetz avatar Jul 19 '13 19:07 ptgoetz

Well we'd want to throttle the overall rate for the supervisor. I think that's what you mean by "peer". If the easiest way to do that is to restrict the supervisor to downloading / sharing one torrent at a time, that could be an acceptable first step. And then we'll open up an issue to fix that. The trick if taking that implementation strategy is choosing which torrent that supervisor should share.

We'd also want to add a special case for when the supervisor has no active workers in which there would be no ul/dl limits.

I think it's ok for there to be minor fluctuation in the ul/dl rate.

nathanmarz avatar Jul 20 '13 01:07 nathanmarz

Mods to ttorrent are here: https://github.com/ptgoetz/ttorrent/tree/rate-limits

ptgoetz avatar Jul 22 '13 05:07 ptgoetz

@ptgoetz So that implements peer-level throttling?

nathanmarz avatar Jul 23 '13 22:07 nathanmarz

@nathanmarz

Throttling has been added to ttorrent at the torrent level: https://github.com/turn/ttorrent/pull/49

But a potential bug was introduced in a separate pull request that got merged around the same time mine did: https://github.com/turn/ttorrent/issues/51

So if you want to see it action, I'd suggest pulling from my fork/branch until that gets sorted out.

The throttling in ttorrent works like so:

  • A ttorrent client instance has exactly one torrent.
  • The client can share (upload/download) with n peers (a peer is just another client that participates in downloading/uploading).
  • upload/download rate limits are set at the client/torrent level, the amount each peer is throttled will depend on how many peers are connected. For example, if you have a client sharing a torrent with 5 leeching (downloading) peers, and you set the max upload rate to 50 kB/sec., each downloading peer will only get 10 kB/sec. If 4 of the 5 leechers stop downloading (or disconnect), the remaining downloading peer will be able to download at 50 kB/sec.

How I see this playing out in storm: (I'm totally open to suggestions, etc.)

  • Nimbus('s)/Supervisors will have separate global upload/download rate limits (a limit <= 0.0 == unlimited).
  • "Toplogy Torrents" will contain: stormjar.jar, stormconf.ser, stormcode.ser (let me know if I'm missing anything).
  • Each Nimbus/Supervisor will have 1 topology torrent for every deployed topology. (right now I'm naming the file "${storm-id}.torrent")
  • If a rate limit is set for Nimbus/Supervisor, the amount of bandwidth allocated to each torrent/client will depend on the number of deployed topologies. For example, if I set nimbus.bittorrent.max.upload.rate: 50.0 and there are two topologies deployed, the 50 kB/sec. bandwidth will be split between the two (25 kB/sec each). Killing one of the two toplogies will rebalance the bandwidth allocation, so the remaining toplogy will then get the full 50 kB/sec.

Seeding:

  • By default, Nimbus will seed a topology torrent until the topology is killed.
  • Supervisors can be configured with: download-only, seed-for-duration, seed-indefinitely

Is that an acceptable feature set for this pull request?

I'd like to nail down those features and move on to sorting the ramifications it will have on supervisor.clj, et. al. There's some logic in there that I have yet to fully grok (sync, download cleanup, etc.), and I may need to lean on you for some clarification/assistance to make sure everything's right.

I'm still in a WIP state at this point, but largely functional. I can deploy a topology such that all its files are distributed (and rate limited) via bittorrent, the topology functions, and supervisor will recover from kill -9-ing a worker.

(I should also note that I haven't pushed many of the above changes yet).

-Taylor

ptgoetz avatar Jul 24 '13 02:07 ptgoetz

OK, sounds good. Let me know when all these changes are in there.

nathanmarz avatar Jul 24 '13 07:07 nathanmarz

OK, the changes are in.

Like I said, I'm not sure I got everything right in nimbus.clj and supervisor.clj -- specifically the sync-processes function. That process deletes the .torrent file, but that doesn't affect sharing since the torrent is already active (or complete) at that point.

Until this pull request gets merged, you'll probably want to use my branch of ttorrent: https://github.com/ptgoetz/ttorrent/tree/rate-limits

ptgoetz avatar Jul 24 '13 18:07 ptgoetz

Fixes are in to ttorrent... So we're good to use the master branch of that dependency now.

ptgoetz avatar Jul 25 '13 12:07 ptgoetz

@nathanmarz Have you had any free cycles to take a look at this? (No problem if you haven't, I'm just wondering if it is still under consideration.)

ptgoetz avatar Aug 02 '13 18:08 ptgoetz

Sorry I haven't had time to go through it more... but this pull request will definitely be merged at some point. This is a very important feature (especially since it paves the way for simpler HA)

nathanmarz avatar Aug 03 '13 00:08 nathanmarz

No problem at all. I'll try to keep it in line with master so the merge is clean.

We are very interested in the HA features, and would be happy to help out if we can. Let me know if there's anything I can help with.

ptgoetz avatar Aug 03 '13 03:08 ptgoetz

This PR might be aided by "topology packages" https://github.com/nathanmarz/storm/issues/557, as it bundles all the state that needs to be replicated.

jasonjckn avatar Aug 29 '13 02:08 jasonjckn

supervisor doesn't seed in the PR?

xumingming avatar Sep 26 '13 09:09 xumingming

@xumingming Supervisors can be configured with: download-only, seed-for-duration, seed-indefinitely.

By default, supervisors will only seed until the download is complete.

ptgoetz avatar Sep 26 '13 14:09 ptgoetz

@xumingming Thanks for the review! I just pushed a commit that addresses your points.

ptgoetz avatar Sep 26 '13 14:09 ptgoetz

+1

xumingming avatar Sep 26 '13 15:09 xumingming

@anfeng Thanks for the input. I pushed a commit to address your concerns.

Let me know if you feel strongly about the "Tracker" class names.

ptgoetz avatar Sep 26 '13 21:09 ptgoetz

@ptgoetz I could live with NimbusTracker, but don't like SupervisorTracker. How about that we rename it to SupervisorPeer? We may want to rename BaseTracker to BasePeer.

anfeng avatar Sep 27 '13 16:09 anfeng

With the introduction of BT, we don't need the following interface of Nimbus. Why are we still keeping them?

  • beginFileDownload
  • downloadChunk

We should also remove Utils::downloadFromMaster().

anfeng avatar Sep 27 '13 17:09 anfeng