cdQA
cdQA copied to clipboard
MemoryError workaround
Kindly consider changing the def _expand_paragraphs function in the cdqa_sklearn.py file to accommodate larger datasets. Modifying the dataframe needs a lot of memory for bigger data so it would be better to set it as a list of dict before making it a dataframe.
Below is the modification I did so I would not get a MemoryError:
@staticmethod
def _expand_paragraphs(df):
data=[]
for n in range(len(df)):
stringlist = df.iloc[n][1]
for m in range(len(stringlist)):
a=df.iloc[n][0]
b=stringlist[m]
data.append({'title' : a, 'content' : b})
dfx = pd.DataFrame(data)
return dfx
Very good point. +1 @nortz8
However, your workaround did not work for me. I ended up having the following;
ValueError: empty vocabulary; perhaps the documents only contain stop words
Any idea why ?