crater Upload logs to S3 in parallel

At the moment Crater uploads logs to S3 in parallel, and that's taking a long time with thousands of them. We should parallelize the uploading loop.

Sep 21 '18 21:09 pietroalbini

Hm, it might be worth looking into an approach where the logs are concatenated or something like that - I think I've heard that individual file uploads are slower than equivalent size in a single file.

Sep 21 '18 22:09 Mark-Simulacrum

What do you mean with concatenated logs?

Sep 21 '18 22:09 pietroalbini

I would love to take a stab at this! I will look into this and if anything ask questions on the Discord.

Oct 16 '18 22:10 chaosteil

I made a (possibly incorrect) assumption that the slowness might come from the quantity of files, not the 'transfer size'; it might be worth looking at whether concatenating them into some larger groups would help. Maybe we just need some form of persistent connection, though, I'm not sure.

Oct 16 '18 22:10 Mark-Simulacrum

@Mark-Simulacrum hmm, I think rusoto already uses a persistent connection. How did you plan to serve the logs if you concatenate them though?

Oct 16 '18 22:10 pietroalbini

I would probably suggest that we serve the logs from crater.rust-lang.org and maybe have ~1000 buckets or so and then the server could decide which section to reply with

Oct 16 '18 22:10 Mark-Simulacrum

Well, that's surely an option (and that would probably also improve compression ratio if we gzip them). I think it's out of scope for this issue though, which can be a big win with minimal effort.

Oct 16 '18 22:10 pietroalbini

Agreed -- I wanted to mention it here so that we at least keep it in mind (mostly for code architecture reasons, I don't think its super important to think about)

Oct 16 '18 22:10 Mark-Simulacrum

@chaosteil hey! Did you make any progress on this? Do you need some help?

Nov 14 '18 10:11 pietroalbini

@pietroalbini Hey! Yes, on both accounts :)

I did modify the ReportWriter trait to return a boxed Future for further processing, but I found it hard to test if my changes were properly running in parallel when executing. I will continue scratching my head on this for a little bit.

Nov 23 '18 19:11 chaosteil

crater crater copied to clipboard

Upload logs to S3 in parallel

crater
crater copied to clipboard