LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

feature: Add serialization of reference dataset

Open svotaw opened this issue 2 years ago • 1 comments

Summary

This is in reference to feature request: https://github.com/microsoft/LightGBM/issues/5426

This PR adds APIs for serializing/deserializing Datasets without their data to a byte array, effectively creating a "schema" or "reference" that can be used to create other Datasets.

Implementation

The existing code for serializing Datasets to file was refactored to be able to go to any generic BinaryWriter, whether memory or file. The verbose serialization code was shared as much as possible, splitting methods into Header vs Data components.

Also, a generic ByteBuffer was created so that higher languages (e.g. Java) are removed from managing the byte memory of the serialized buffer.

Test

New C++ tests were created to test both the serialization/deserialization and the new ByteBuffer functionality.

svotaw avatar Aug 16 '22 22:08 svotaw

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀 https://github.com/microsoft/LightGBM/actions/runs/2893166788

Status: success ✔️.

jameslamb avatar Aug 20 '22 02:08 jameslamb

@shiyu1994 can you help to reivew?

guolinke avatar Sep 27 '22 04:09 guolinke

@shiyu1994 can you take a look? ty

svotaw avatar Oct 22 '22 17:10 svotaw

@shiyu1994 Just checking in

svotaw avatar Nov 08 '22 18:11 svotaw

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀 https://github.com/microsoft/LightGBM/actions/runs/3826303140

Status: success ✔️.

jameslamb avatar Jan 03 '23 02:01 jameslamb

@shiyu1994 @guolinke can you help with a review on this?

jameslamb avatar Jan 03 '23 02:01 jameslamb

Sorry for the late response. Will review it within the next two days.

shiyu1994 avatar Jan 03 '23 04:01 shiyu1994

@shiyu1994 I made the requested changes. Can you look it over and try rerunning the failures? they don't seem related to this PR

svotaw avatar Feb 14 '23 01:02 svotaw

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀 https://github.com/microsoft/LightGBM/actions/runs/4169689163

Status: success ✔️.

jameslamb avatar Feb 14 '23 02:02 jameslamb

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

github-actions[bot] avatar Aug 15 '23 20:08 github-actions[bot]