High Initial RAM Usage Leads to Crashes
I've been downloading select URLs from LAION-400M, -5B, and SBU and have noticed that there is a significant spike in RAM usage on startup that causes instances with <=32GB RAM, such as AWS' c6i.4xlarge, to crash. While img2dataset is running, however, RAM usage remains very low. I'd love if we could somehow mitigate that initial spike to be able to use instances with lower RAM throughout. Here's a screenshot from wandb.ai showing the initial spike on a 64GB instance:
Here's another sample from a crashed c6i.4xlarge instance where we can see available process memory approach 0 before crashing:
Maybe the cause of the crashes is something else but I have not been able to run
img2dataset on a c6i.4xlarge instance yet.
Interesting. I think that's due to how the parquet file is processed (reader file) That's probably easy enough to fix
On Wed, Aug 9, 2023, 16:49 Sypherd @.***> wrote:
Here's another sample from a crashed c6i.4xlarge instance where we can see available process memory approach 0 before crashing: [image: image] https://user-images.githubusercontent.com/50557586/259449237-e6467005-19a0-4748-a96d-2b00bac37eef.png Maybe the cause of the crashes is something else but I have not been able to run img2dataset on a c6i.4xlarge instance yet.
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/338#issuecomment-1671553124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437QQO7PCJDUT5I6GXRDXUOPQ3ANCNFSM6AAAAAA3KDL3WA . You are receiving this because you are subscribed to this thread.Message ID: @.***>