DeepLog icon indicating copy to clipboard operation
DeepLog copied to clipboard

Memory Error while running the training code

Open Rufaida94 opened this issue 4 years ago • 3 comments

Hi @wuyifan18 , thank you for the great tool. It works perfectly with a very small dataset, but whenever I try running it with a larger dataset I get this error during the training phase:

[enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 278528000 bytes. Error code 12 (Cannot allocate memory)

My machine has large RAM but I am not sure why is this happening or how can I resolve this issue? Or how can we edit the code so that we use less memory space with each epoch? now each epoch takes approximately 1 GB of memory.

Any suggestion is highly appreciated.

Thanks

Rufaida94 avatar Jun 21 '21 13:06 Rufaida94

Hi @Rufaida94, try reducing the batch_size?

wuyifan18 avatar Jun 22 '21 04:06 wuyifan18

I've reduced the batch size and removed any writing into memory except for the model and it worked fine. Thanks

Rufaida94 avatar Jun 22 '21 13:06 Rufaida94

@wuyifan18 Is there any way to make the code run faster (especially the training code) for a very large dataset? Currently, it is taking about 1 hour for each epoch with either a GPU or a CPU.

Rufaida94 avatar Jun 22 '21 16:06 Rufaida94