pytorch_fnet icon indicating copy to clipboard operation
pytorch_fnet copied to clipboard

Could not train the model on AWS

Open Li-En-Good opened this issue 4 years ago • 2 comments

I was trying to use the pytorch_fnet on an amazon EC2 instance (I used a g4 instance). When I run the download_and_train.py, it always gives me: _mkl-service + Intel(R) MKL_ MKL_THREADING_LAYER=INTEL is incompatible with libgomp-7c85b1e2.so.1 library.Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it_ I tried importing numpy first but it still gave the same error.

I also tried to use release_1 but the pytorch=0.1.8 doesn't seem to fit the EC2 instance so conda env create -f environment.yml failed. Please let me know if there is a way to fix this. Thanks a lot!

Li-En-Good avatar May 01 '20 19:05 Li-En-Good

This seems related to differences in compilers that were used to create the numpy release you have installed on your environment. This is likely caused by using some combination of conda installations and pip installations of different libraries. Since you are on a linux instance, I would try falling back to using only pip to install and compile packages, rather than relying on pre-compiled binaries from conda.

fcollman avatar May 02 '20 18:05 fcollman

You could also try relaxing some of the constraints in the environment.yml file, depending on what the conda failure was. Usually this is some incompatibility of available pre-compiled binaries.

fcollman avatar May 02 '20 18:05 fcollman