cdQA
cdQA copied to clipboard
numpy core fromnumeric.py error in QAPipeline.fit_retriever
Describe the bug Replication of a QAPipeline as in your example in fit_retriever() related to numpy.core.fromnumeric
To Reproduce Steps to reproduce the behavior: tutorial-use-pdf-converter.ipynb
- Go to '...' tutorial-use-pdf-converter.ipynb cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib', max_df=1.0)
Fit Retriever to documents
cdqa_pipeline.fit_retriever(df=df) cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib', max_df=1.0)
Fit Retriever to documents
cdqa_pipeline.fit_retriever(df=df)
Screenshots
ValueError Traceback (most recent call last)
/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in fit_retriever(self, df) 109 ) 110 else: --> 111 self.metadata = self._expand_paragraphs(df) 112 113 self.retriever.fit(self.metadata)
/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in _expand_paragraphs(df) 230 { 231 col: np.repeat(df[col].values, df[lst_col].str.len()) --> 232 for col in df.columns.drop(lst_col) 233 } 234 ).assign(**{lst_col: np.concatenate(df[lst_col].values)})[df.columns]
/mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in
<array_function internals> in repeat(*args, **kwargs)
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in repeat(a, repeats, axis) 479 [3, 4]]) 480 --> 481 """ 482 return _wrapfunc(a, 'repeat', repeats, axis=axis) 483
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds) 59 60 try: ---> 61 return bound(*args, **kwds) 62 except TypeError: 63 # A TypeError occurs if the object does have such a method in its
ValueError: repeats may not contain negative values. Desktop (please complete the following information):
Execute notebook examples on Azure ML with V100 GPU.
Additional context What is the requirement for numpy version I have installed 1.18.2 numpy version All other requirements met as in requirements.txt
Hi, I analyzed the issue and the problem consists in the dataframe format in input to fit_retriever() method. fit_retirever() QAPipeline works fine for df of a format like bnp one. May I ask which is the format for df dataframe (a dataframe with title , paragraphs columns)