LightGBM
LightGBM copied to clipboard
feature: Add serialization of reference dataset
Summary
This is in reference to feature request: https://github.com/microsoft/LightGBM/issues/5426
This PR adds APIs for serializing/deserializing Datasets without their data to a byte array, effectively creating a "schema" or "reference" that can be used to create other Datasets.
Implementation
The existing code for serializing Datasets to file was refactored to be able to go to any generic BinaryWriter
, whether memory or file. The verbose serialization code was shared as much as possible, splitting methods into Header vs Data components.
Also, a generic ByteBuffer
was created so that higher languages (e.g. Java) are removed from managing the byte memory of the serialized buffer.
Test
New C++ tests were created to test both the serialization/deserialization and the new ByteBuffer
functionality.
/gha run r-valgrind
Workflow R valgrind tests has been triggered! 🚀 https://github.com/microsoft/LightGBM/actions/runs/2893166788
Status: success ✔️.
@shiyu1994 can you help to reivew?
@shiyu1994 can you take a look? ty
@shiyu1994 Just checking in
/gha run r-valgrind
Workflow R valgrind tests has been triggered! 🚀 https://github.com/microsoft/LightGBM/actions/runs/3826303140
Status: success ✔️.
@shiyu1994 @guolinke can you help with a review on this?
Sorry for the late response. Will review it within the next two days.
@shiyu1994 I made the requested changes. Can you look it over and try rerunning the failures? they don't seem related to this PR
/gha run r-valgrind
Workflow R valgrind tests has been triggered! 🚀 https://github.com/microsoft/LightGBM/actions/runs/4169689163
Status: success ✔️.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.