crater icon indicating copy to clipboard operation
crater copied to clipboard

Upload logs to S3 in parallel

Open pietroalbini opened this issue 7 years ago • 10 comments

At the moment Crater uploads logs to S3 in parallel, and that's taking a long time with thousands of them. We should parallelize the uploading loop.

pietroalbini avatar Sep 21 '18 21:09 pietroalbini

Hm, it might be worth looking into an approach where the logs are concatenated or something like that - I think I've heard that individual file uploads are slower than equivalent size in a single file.

Mark-Simulacrum avatar Sep 21 '18 22:09 Mark-Simulacrum

What do you mean with concatenated logs?

pietroalbini avatar Sep 21 '18 22:09 pietroalbini

I would love to take a stab at this! I will look into this and if anything ask questions on the Discord.

chaosteil avatar Oct 16 '18 22:10 chaosteil

I made a (possibly incorrect) assumption that the slowness might come from the quantity of files, not the 'transfer size'; it might be worth looking at whether concatenating them into some larger groups would help. Maybe we just need some form of persistent connection, though, I'm not sure.

Mark-Simulacrum avatar Oct 16 '18 22:10 Mark-Simulacrum

@Mark-Simulacrum hmm, I think rusoto already uses a persistent connection. How did you plan to serve the logs if you concatenate them though?

pietroalbini avatar Oct 16 '18 22:10 pietroalbini

I would probably suggest that we serve the logs from crater.rust-lang.org and maybe have ~1000 buckets or so and then the server could decide which section to reply with

Mark-Simulacrum avatar Oct 16 '18 22:10 Mark-Simulacrum

Well, that's surely an option (and that would probably also improve compression ratio if we gzip them). I think it's out of scope for this issue though, which can be a big win with minimal effort.

pietroalbini avatar Oct 16 '18 22:10 pietroalbini

Agreed -- I wanted to mention it here so that we at least keep it in mind (mostly for code architecture reasons, I don't think its super important to think about)

Mark-Simulacrum avatar Oct 16 '18 22:10 Mark-Simulacrum

@chaosteil hey! Did you make any progress on this? Do you need some help?

pietroalbini avatar Nov 14 '18 10:11 pietroalbini

@pietroalbini Hey! Yes, on both accounts :)

I did modify the ReportWriter trait to return a boxed Future for further processing, but I found it hard to test if my changes were properly running in parallel when executing. I will continue scratching my head on this for a little bit.

chaosteil avatar Nov 23 '18 19:11 chaosteil