lm_dataformat icon indicating copy to clipboard operation
lm_dataformat copied to clipboard

"current chunk incomplete" without any json1.zst file

Open Guruprasad93 opened this issue 2 years ago • 1 comments

I'm trying to write to lmd files - by the ar.commit() method -- but after I create a loop and add a bunch of data to the lmd file - there's only a current chunk incomplete file - with a size of 10GB -- but there isn't any json1.zst file..

Should I instead split the files - and create multiple json1.zst - instead of adding it to the same file? or is there a better fix?

Guruprasad93 avatar Sep 16 '22 17:09 Guruprasad93

If you call commit() it should rename the chunk incomplete and have your zst file. I'm not sure about this version, but if you are still interested, you can check out my fork https://github.com/lfoppiano/lm_dataformat

https://github.com/lfoppiano/stackexchange-dataset/blob/master/pairer.py#L82C1-L85C23

lfoppiano avatar Dec 08 '23 04:12 lfoppiano