ICLR2023-OpenReviewData icon indicating copy to clipboard operation
ICLR2023-OpenReviewData copied to clipboard

Cannot crawl the data from the OpenReview website

Open DongXingshuai opened this issue 9 months ago • 4 comments

Hi there, I tried to run the parse_data.py to crawl data from openreview. Unfortunately, it did not work. The following are the error messages. Is anybody can give me a hand? Thank you!

ipython parse_data.py Offset: 0 Data: 0 Offset: 1000 Data: 1000 Offset: 2000 Data: 2000 Offset: 3000 Data: 3000 Offset: 4000 Data: 3809 Number of submissions: 3809 Number of papers (including old): 4874 0%| | 0/4874 [00:00<?, ?it/s] 0%| | 0/4874 [00:00<?, ?it/s]

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/home/dongxingshuai/anaconda3/envs/nlp/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/dongxingshuai/research_associate/nlp/ICLR2023-OpenReviewData-main/notebooks/parse_data.py", line 166, in filter_data withdraw = 1 if 'Withdrawn_Submission' in meta_note[0]['invitation'] else 0 IndexError: list index out of range """

The above exception was the direct cause of the following exception:

IndexError Traceback (most recent call last) File ~/research_associate/nlp/ICLR2023-OpenReviewData-main/notebooks/parse_data.py:195 190 # In[59]: 191 192 193 # filter data in a pool of processes 194 with Pool(8) as p: --> 195 filtered_notes = list(tqdm(p.imap(filter_data, notes), total=len(notes))) 198 # In[60]: 199 200 201 # create dataframe 202 ratings = pd.DataFrame(filtered_notes)

File ~/anaconda3/envs/nlp/lib/python3.8/site-packages/tqdm/notebook.py:249, in tqdm_notebook.iter(self) 247 try: 248 it = super(tqdm_notebook, self).iter() --> 249 for obj in it: 250 # return super(tqdm...) will not catch exception 251 yield obj 252 # NB: except ... [ as ...] breaks IPython async KeyboardInterrupt

File ~/anaconda3/envs/nlp/lib/python3.8/site-packages/tqdm/std.py:1182, in tqdm.iter(self) 1179 time = self._time 1181 try: -> 1182 for obj in iterable: 1183 yield obj 1184 # Update and possibly print the progressbar. 1185 # Note: does not call self.update(1) for speed optimisation.

File ~/anaconda3/envs/nlp/lib/python3.8/multiprocessing/pool.py:868, in IMapIterator.next(self, timeout) 866 if success: 867 return value --> 868 raise value

IndexError: list index out of range

DongXingshuai avatar Sep 07 '23 03:09 DongXingshuai