matchingR icon indicating copy to clipboard operation
matchingR copied to clipboard

R-studio crashes when running the galeShapley.collegeAdmissions function on a large population

Open andruskos opened this issue 2 years ago • 3 comments

Hi, I have been attempting to apply the galeShapley.collegeAdmissions() function in R to a dataset with the following dimensions: Number of applicants: 62.941 Number of colleges: 3534 I run into the problem that r-studio crashes (it "encounters a fatal error and aborts the session") after some time when running the function with the full dataset. I am however able to run the function successfully with 55.000 applicants and all 3534 colleges. Hence, the problem seems to be related to the size of the dataset. Do you have any suggestions for how one might tweak the function to make it run on the full population?

Please let me know if you require more details on the problem at hand.

Thank you in advance!

andruskos avatar Jan 31 '23 09:01 andruskos

I'm afraid you're running out of memory. I've used this package to compute the Gale Shapley algorithm with 26,000 participants on each side of the market (see this paper https://sangmok81.github.io/website/wp/13_large_matching.pdf), but nothing larger than that.

It's most likely possible to memory-optimize the implementation quite a bit, but it's probably easier to just run this on a larger machine if you can 😉

jtilly avatar Jan 31 '23 09:01 jtilly

Thank you for the fast response! :)

I am afraid that I do not have access to more memory as of now. How would you approach memory-optimizing the implementation? It seems I am not far off being able to run the function on the full population. Hence, a few adjustment would likely make the difference.

andruskos avatar Jan 31 '23 10:01 andruskos

I would need to profile the code a bit. One source of wasted memory is that I replicate colleges with multiple slots. I.e. one college with 3 slots will become 3 rows. Another thing to check is that all the data types are as small as possible. E.g., the matrices with the utilities probably don't need to be float64 (float32 probably enough), etc.

Just get an AWS EC2 instance with more memory for a couple of hours! 😉

jtilly avatar Jan 31 '23 10:01 jtilly