WSDM2021_NSM icon indicating copy to clipboard operation
WSDM2021_NSM copied to clipboard

preprocess_step1 takes too long time

Open qiaopr opened this issue 3 years ago • 6 comments

Hi,How fast are you preprocess_step1.py files running?Mine has take several hours to process only 150 piece of data. It's unbeliviable!

qiaopr avatar Dec 26 '21 08:12 qiaopr

Did you modify any parameters? Such as graph size. I think it won't be so slow. When I processed the data, I just keep it run in the backend and get the data the next day. I guess any dataset can be processed in one day.

RichardHGL avatar Dec 26 '21 16:12 RichardHGL

I exactly followed the preprocess >Freebase>README.md.I didn't modify any parameters. The data I'm processed is CWQ.

qiaopr avatar Dec 26 '21 16:12 qiaopr

I also faced this problem.

LLLiaomeng avatar Dec 27 '21 02:12 LLLiaomeng

Also ran into this problem. It seems that the preprocess_step1 takes a lot of time to output only a few, while my CPU, GPU and memory usage are in a healthy state. sad :(

JasonCen-sweetdreams avatar Dec 27 '21 03:12 JasonCen-sweetdreams

Okay, I'll check this problem next month. You can also try to look into ppr_util.py, I think the majority of time is spent on the calculation of ppr for every graph.

RichardHGL avatar Dec 27 '21 08:12 RichardHGL

The preprocessed datasets can be found in Readme of this repo. If you find any possible ways to improve the efficiency of preprocessing, please kindly let me know.

RichardHGL avatar Feb 13 '22 19:02 RichardHGL