MachineLearningNotebooks icon indicating copy to clipboard operation
MachineLearningNotebooks copied to clipboard

AzureML pipeline designer pyarrow dependency while installing transformers

Open poojithag554 opened this issue 2 years ago • 3 comments

I was trying to import transformers in AzureML designer pipeline, it says for importing transformers and datasets the version of pyarrow needs to >=3.0.0, but then after upgrading pyarrow's version to 3.0.0 and importing transformers pyarrow version is reset to original version of 0.16.0. attaching few error samples. please have a look.

Got exception when invoking script: 'RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):To use datasets, the module pyarrow>=3.0.0 is required, and the current version of pyarrow doesn't match this condition.If you are running this in a Google Colab, you should probably just restart the runtime to use the right version of pyarrow.' azureml-designer-core 0.0.68 requires pyarrow==0.16.0, but you'll have pyarrow 3.0.0 which is incompatible.

poojithag554 avatar Feb 28 '22 13:02 poojithag554

Same issue here. Hugging Face dataset requires higher version of pyarrow of v3. Can you please lift the upper bound version?

dunalduck0 avatar Mar 23 '22 03:03 dunalduck0

Hi @poojithag554 , Sorry for the inconvenience caused. Could you please share details of use case that you use pyarrow in Designer? It would also be great if we could set up a quick call for us to learn from your scenario and see if there is any solution that fits your case. Please contact [email protected].

likebupt avatar Apr 14 '22 07:04 likebupt

Hi @likebupt, in the designer i was trying to execute custom python script, to use transformers and datasets python packages for these pyarrow version needs to be >=5.0.0, even after successful installation of pyarrow version 5.0.0 it is downgraded to 0.16.0.

azureml-dataset-runtime 1.36.0 requires pyarrow<4.0.0,>=0.17.0, but you'll have pyarrow 5.0.0 which is incompatible.
Successfully installed pyarrow-5.0.0
----------PYARROW VERSION  INSTALLED______________ 0.16.0

I understand that in designer-core upper bound of pyarrow is 4.0.0 but for using datasets package 5.0.0 is required, and as you can see from above even after 5.0.0 installation it is returning to 1.16.0 Installing packages os.system(f"pip install transformers") os.system(f"pip install datasets")

poojithag554 avatar Apr 14 '22 12:04 poojithag554