[c++] Fix memory leak in Arrow table implementation
Motivation
When trying to integrate LightGBM v4.2 into our codebase, we noticed an unexpected increase in memory consumption when using LightGBM's new Arrow interface under certain circumstances.
It turns out that the Arrow implementation causes memory leaks as it does not properly release the Arrow arrays being passed to it. Unfortunately, this issue only really surfaces if the data being passed to Arrow is not in memory during the entire program execution (which is frequently the case) and the leakage really only becomes an issue if different data is passed to the LightGBM Arrow interface multiple times (this is where we discovered it).
The fix is simple and the reasoning behind it is outlined clearly in the Arrow C Data interface specification.
Huge shoutout to @delsner who helped me debug this memory leak and helped me learn a lot more about Arrow, allocators and memory debugging 😁
Sorry for not already including this in #6034 and its sibling PRs... I still had some gaps in my understanding of the Arrow C Data interface :eyes:
FYI, if anyone has a good idea how to write good tests for this, please let me know :eyes: once this is merged, we should probably also cut a new release soonish to allow people to confidently use the Arrow interface.
Just want to add that if/when we add support for Arrow data via the R-package here
Would be great! I myself, however, have close to no experience with R, unfortunately 😅
I've approved this PR. Could you update the branch to sync with master so that it can be merged?
@jameslamb can you tell me how to retry jobs run on Azure? I can't seem to find a button that allows me to do this.
@jameslamb can you tell me how to retry jobs run on Azure? I can't seem to find a button that allows me to do this.
It's a Microsoft-controlled account so I suspect @shiyu1994 would have to add explicitly add you. I believe @guolinke did that for me a few years ago while still at Microsoft. Message @shiyu1994 in the maintainer slack and hopefully he can get you set up.
Until then, @ me any time and I can re-run jobs for you. I'll re-run the ones from the PR in a few minutes.
@borchero Could you register an Azure DevOps account with the same email as your Github? Then, I'll add you to the Azure DevOps Team for our CI jobs.
Done @shiyu1994, thanks a lot! :)