crater
crater copied to clipboard
Upload logs to S3 in parallel
At the moment Crater uploads logs to S3 in parallel, and that's taking a long time with thousands of them. We should parallelize the uploading loop.
Hm, it might be worth looking into an approach where the logs are concatenated or something like that - I think I've heard that individual file uploads are slower than equivalent size in a single file.
What do you mean with concatenated logs?
I would love to take a stab at this! I will look into this and if anything ask questions on the Discord.
I made a (possibly incorrect) assumption that the slowness might come from the quantity of files, not the 'transfer size'; it might be worth looking at whether concatenating them into some larger groups would help. Maybe we just need some form of persistent connection, though, I'm not sure.
@Mark-Simulacrum hmm, I think rusoto already uses a persistent connection. How did you plan to serve the logs if you concatenate them though?
I would probably suggest that we serve the logs from crater.rust-lang.org and maybe have ~1000 buckets or so and then the server could decide which section to reply with
Well, that's surely an option (and that would probably also improve compression ratio if we gzip them). I think it's out of scope for this issue though, which can be a big win with minimal effort.
Agreed -- I wanted to mention it here so that we at least keep it in mind (mostly for code architecture reasons, I don't think its super important to think about)
@chaosteil hey! Did you make any progress on this? Do you need some help?
@pietroalbini Hey! Yes, on both accounts :)
I did modify the ReportWriter trait to return a boxed Future for further processing, but I found it hard to test if my changes were properly running in parallel when executing. I will continue scratching my head on this for a little bit.