crater
                                
                                
                                
                                    crater copied to clipboard
                            
                            
                            
                        Report generation is slow and prone to failure
Generating Crater reports is now really slow compared to a year or two ago, as Crater is handling an ever-increasing amount of crates it needs to test. Lately the Crater server also started crashing when generating the report of some runs. There are two main causes I see for this:
- When generating the report tarballs of all the logs are created to aid processing the results locally. Each tarball is fully kept in memory before being persisted to S3 though, so with an higher enough number of logs the server could OOM.
 - Uploading the logs is really slow, as Crater currently uploads one file to S3 at the time. With hundreds of thousands of logs to upload this quickly becomes a problem.
 
While the proper solution would be to fix both issues (by using the filesystem as the temporary storage for archives while they're created and by uploading logs to S3 in parallel), I think there is a quicker approach that could postpone both problems.
Right now we're handling and uploading the logs for all the crates, even the ones that were not regressions. Because of that, most of the logs we upload are actually useless (like the logs for test-pass crates). If we were to change the report process to just avoid processing uninteresting logs we would save storage space and make the problems go away for a long time.
I think we should move the tarballs to disk, but I at least find it sometimes helpful to look for past successful crate builds. I wouldn't stop uploading them personally; I think the reliability issues here should be solved by moving to disk storage for the tarballs.
Honestly I thought test-pass + test-pass results were never looked at. If someone actually uses them it's fine to keep them!
https://github.com/rust-lang/crater/pull/659 takes a start at this, by only uploading regressed crate's logs as raw files (vs. compressed tarballs). Should drastically speed things up for most crater runs.