ttorrent icon indicating copy to clipboard operation
ttorrent copied to clipboard

Download directly to disk for host imaging

Open michaelstoops opened this issue 8 years ago • 0 comments

I'm looking at a use case very similar to Turn's original use case, except that my datacenter has Raspberry Pi hosts at a total cost of a few hundred bucks, and it's mostly for fun and learning. I'd appreciate your advice on how to accomplish this using the ttorrent code base. If new code is required, I could contribute (assuming my employer agrees to it). If it's possible without new code, no need to add reinvented wheels.

My specific use case is for imaging Raspberry Pi 2 devices, with 8GB Class 10 SD cards. This means I have about 1 GB RAM to work with, 10 MB/s bandwidth on the SD card, and 100 Mbps network bandwidth on each host. I want this job done quickly and efficiently. The seeder is also a Pi, with similar network and storage bandwidth, so traditional client-server download would be seriously bottlenecked at the server, and avoiding that bottleneck is my main motivation to use BT. The current system image is the Raspbian Wheezy distro, which gzips down to 1.0 GB. The clients are all on a 24-port, 100 Mbps full-duplex switch with 12.8 Gbps of internal switching capacity. If I could saturate it and put every bit of that to SD card, I'd be thrilled. Also, I could image a switch full of hosts in three seconds. In practicality, the seeder would take 100 seconds to upload one full image. But if I could image 24 hosts in that time, I'd still brag about it.

Some details on my use case:

  • My bootstrapper platform, derived from rpi-buildroot, uses ramdisk for the root filesystem. Therefore, I can freely write to any area of the SD card.
  • Most of the time, I will want to write partition 2-4 to the SD card, but prevent writing over the MBR and partition 1. This is very specific to my application, and I'd think that the benefit to the ttorrent codebase is adding the ability to give ttorrent windows of addresses where it should and shouldn't write to the output file.
    • I think the transfer algorithm wouldn't like a big hole where it can't read or write. I plan for 128 MiB for the no-write zone on partition 1, so I can afford to download and store that portion in memory. However, if I could block out that window from the propagation algorithm entirely, that would be more efficient.
    • I do need some of the data from the no-write window, because the clients need to know the downloaded image's partition table, so that I can align the real partition table accordingly.
    • I could easily provide a customized java.io.OutputStream that does what I want, with the interface the same as any other file stream. This would provide very convenient flexibility for ttorrent as a library. Maybe it has this already, but I'd appreciate your advice anyway.
  • In some circumstances, I want the downloaded image to write to the whole SD card, no no-write windows. This is to update the bootstrapper that I store in partition 1.
  • I'm essentially imaging my entire storage resource, so I have no option to store the whole image as a file in a filesystem and then transfer it to the SD card. I want this to be a touchless process, so adding more storage just for imaging is not an option. Besides, staging the image to a file would be wasteful of the SD card's bandwidth, taking 200+ seconds just move 1 GB of data around on the card.
  • I'll definitely want to compress the system image for transfer to conserve network bandwidth. But if I compress the file before transfer it won't map nicely to blocks on the SD card. I hope the protocol compresses pieces in transfer, so that I can get both the networking efficiency of compression and the 1:1 mapping of transferred file to disk sectors.
  • I would prefer to keep my system image master files as direct images of the SD card, without preprocessing them. This is so that I can mount them on loopback, and dd and fdisk the images directly, etc. However, I would be willing to pre-compress pieces to preserve precious SD card bandwidth on the seeder. Also, my seeder may have enough RAM to prefetch most of the master image. I don't know whether this would get ruined by the rarest-first replication algorithm.
  • Speaking of caching, if you imagined memcache techniques speeding up the distribution algorithm, how would you do it? My network easily has more than enough RAM to hold a distributed cache of the whole system image.
  • I would prefer not to have a dedicated tracker, and instead to have clients send a broadcast Ethernet frame to discover other peers, acting as trackers. This is to let hosts on the same Ethernet segment find each other without dedicated infrastructure, reduce the system footprint by one host, and avoid overloading the uplinks between Ethernet switches. On the other hand, a single topology-aware tracker could factor in a costing algorithm to achieve the same goal, with the added benefit of utilizing the switch uplinks, such as they are able. Which would you think is more efficient, and which approach would you prefer to add to the ttorrent codebase?

I was thinking that I could accomplish the windowed stream by giving the download file path as the raw SD card, /dev/mmcblk0, but shimming the normal file I/O stream with a stream that redirects to a memory stream for I/O in the no-write zone. This should allow the client to upload and download normally, without overwriting my MBR and bootstrapper partition.

I would appreciate point-by-point response, although I understand that responding to my thoughts is not your job. Many thanks. :)

michaelstoops avatar Sep 20 '15 07:09 michaelstoops