dblp icon indicating copy to clipboard operation
dblp copied to clipboard

paper.csv is too large to save in my computer

Open rebecca312 opened this issue 6 years ago • 1 comments

When I tried to run the pipeline, paper.csv was generated from Miner-Papertxt (about 2.2G). And the paper.csv file was too large (exceeded 1.7T) but my computer has only about 2T storage space. So it failed each time I run the project. Do you know how to fix this?

rebecca312 avatar May 09 '18 07:05 rebecca312

I'm surprised it's so big. I didn't catalog file sizes, but I don't remember anything being even close to 1T in size. IIRC, I was able to store everything on a machine with only 500G. It's been a while though, so I may be misremembering.

You could try modifying the code that writes the file to compress it first. I think pandas supports writing in compressed formats via extra kwargs.

macks22 avatar Jan 31 '23 13:01 macks22