sr4rs Tensorflow Issue 'Illegal Instruction'

Hello,

I absolutely appreciate the work done here and super resolution imagery would be very useful for fine grained classification. I am using a Mac OS Big Sur Version 11.4. I had no issues with the docker installation. Within the docker container, when I try to import tensorflow in python or even paste the following code after importing otbApplication successfully: infer = otbApplication.Registry.CreateApplication("TensorflowModelServe") I receive an error stating "Illegal Instruction" This error terminates python immediately.

I have tried multiple things such as reinstalling the docker container, restarting my computer, and am not able to uninstall and reinstall tensorflow. Would you have any comments or solutions to this?

Jan 18 '22 20:01 ankshah131

hi @ankshah131 ,

Thanks. Maybe it's because your computer does not support some of the CPU flags which have been used to build the docker image. Can give a try with the cpu-basic docker image?

Jan 18 '22 21:01 remicres

Thanks a lot! That certainly solved the illegal instruction issue. I am now able to import tensorflow. The next issue I am currently facing is displayed below:

`2022-01-18 22:51:28 (INFO) TensorflowModelServe: Default RAM limit for OTB is 256 MB 2022-01-18 22:51:28 (INFO) TensorflowModelServe: GDAL maximum cache size is 99 MB 2022-01-18 22:51:28 (INFO) TensorflowModelServe: OTB will use at most 4 threads 2022-01-18 22:51:28.903583: I tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: sr4rs_sentinel2_bands4328_france2020_savedmodel 2022-01-18 22:51:32.954066: I tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve } 2022-01-18 22:51:32.954345: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: sr4rs_sentinel2_bands4328_france2020_savedmodel 2022-01-18 22:51:32.957233: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2022-01-18 22:51:33.942768: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle. 2022-01-18 22:51:36.573913: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2900000000 Hz 2022-01-18 22:51:37.390802: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing. 2022-01-18 22:51:37.390906: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.

2022-01-18 22:51:37.569339: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 9437184 exceeds 10% of free system memory. 2022-01-18 22:51:37.684794: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 9437184 exceeds 10% of free system memory. 2022-01-18 22:51:38.425301: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 21233664 exceeds 10% of free system memory. 2022-01-18 22:51:39.343817: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 21233664 exceeds 10% of free system memory. 2022-01-18 22:51:39.589912: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 9437184 exceeds 10% of free system memory.`

I tried changing the tile size to 256 and even reduce the batch size in training to 2. Any advice or tips to resolve this would be appreciated. Thanks a lot!

Jan 18 '22 22:01 ankshah131

Another error I get after some attempts to adjust the memory is the following:

RuntimeError: Exception thrown in otbApplication Application_ExecuteAndWriteOutput: /src/otb/otb/Modules/IO/IOGDAL/src/otbGDALImageIO.cxx:1406: itk::ERROR: GDALImageIO(0x45afd0f0): Free disk space available is 7609483264 bytes, whereas 29481762816 are at least necessary. You can disable this check by defining the CHECK_DISK_FREE_SPACE configuration option to FALSE

Jan 18 '22 23:01 ankshah131

Here is the entire code with the results and error message:

`otbuser@d056e566b158:/code_data$ python sr4rs/code/sr.py --savedmodel sr4rs_sentinel2_bands4328_france2020_savedmodel --input otb_example.tif --output test.tif Namespace(encoding='auto', input='otb_example.tif', output='test.tif', pad=64, savedmodel='sr4rs_sentinel2_bands4328_france2020_savedmodel', ts=1024) 2022-01-19 00:00:56 (INFO) ReadImageInfo: Image general information: Number of bands : 4 Data type : float No data flags : Not found Start index : [0,0] Size : [10980,10980] Origin : [300005,1.60002e+06] Spacing : [10,-10] Estimated ground spacing (in meters): [9.98841,10.0512]

Image acquisition information: Sensor : Image identification number: Image projection : PROJCS["WGS 84 / UTM zone 28N", GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4326"]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",0], PARAMETER["central_meridian",-15], PARAMETER["scale_factor",0.9996], PARAMETER["false_easting",500000], PARAMETER["false_northing",0], UNIT["metre",1, AUTHORITY["EPSG","9001"]], AXIS["Easting",EAST], AXIS["Northing",NORTH], AUTHORITY["EPSG","32628"]]

Image default RGB composition: [R, G, B] = [0,1,2]

Ground control points information: Number of GCPs = 0 GCPs projection =

2022-01-19 00:00:56 (INFO) TensorflowModelServe: Default RAM limit for OTB is 256 MB 2022-01-19 00:00:56 (INFO) TensorflowModelServe: GDAL maximum cache size is 800 MB 2022-01-19 00:00:56 (INFO) TensorflowModelServe: OTB will use at most 6 threads 2022-01-19 00:00:56.691656: I tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: sr4rs_sentinel2_bands4328_france2020_savedmodel 2022-01-19 00:00:56.973368: I tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve } 2022-01-19 00:00:56.973459: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: sr4rs_sentinel2_bands4328_france2020_savedmodel 2022-01-19 00:00:56.974532: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-01-19 00:00:57.761886: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle. 2022-01-19 00:00:57.990174: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2900000000 Hz 2022-01-19 00:00:58.368166: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing. 2022-01-19 00:00:58.368250: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started. 2022-01-19 00:00:58.576866: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down. 2022-01-19 00:00:58.580305: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 1888652 microseconds. 2022-01-19 00:00:58 (INFO) TensorflowModelServe: Source info : 2022-01-19 00:00:58 (INFO) TensorflowModelServe: Receptive field : [288, 288] 2022-01-19 00:00:58 (INFO) TensorflowModelServe: Placeholder name : lr_input 2022-01-19 00:00:58 (INFO) TensorflowModelServe: Output spacing ratio: 0.25 2022-01-19 00:00:58 (INFO) TensorflowModelServe: The TensorFlow model is used in fully convolutional mode 2022-01-19 00:00:58 (INFO) TensorflowModelServe: Output field of expression: [1024, 1024] 2022-01-19 00:00:58 (INFO) TensorflowModelServe: Tiling disabled 2022-01-19 00:00:58 (WARNING): Streaming configuration through extended filename is used. Any previous streaming configuration (ram value, streaming mode ...) will be ignored. 2022-01-19 00:00:58 (INFO): File test.tif will be written in 1764 blocks of 1024x1024 pixels Writing test.tif??&streaming:type=tiled&streaming:sizemode=height&streaming:sizevalue=1024...: 0% [ ]2022-01-19 00:01:00.649782: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1358954496 exceeds 10% of free system memory. 2022-01-19 00:01:08.760920: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1358954496 exceeds 10% of free system memory. 2022-01-19 00:01:08.761004: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 5435817984 exceeds 10% of free system memory. 2022-01-19 00:01:38.839411: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1358954496 exceeds 10% of free system memory. 2022-01-19 00:01:40.678089: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1358954496 exceeds 10% of free system memory. ERROR 3: Free disk space available is 2427334656 bytes, whereas 29595009024 are at least necessary. You can disable this check by defining the CHECK_DISK_FREE_SPACE configuration option to FALSE. Traceback (most recent call last): File "sr4rs/code/sr.py", line 78, in infer.ExecuteAndWriteOutput() File "/opt/otbtf/lib/otb/python/otbApplication.py", line 2321, in ExecuteAndWriteOutput return _otbApplication.Application_ExecuteAndWriteOutput(self) RuntimeError: Exception thrown in otbApplication Application_ExecuteAndWriteOutput: /src/otb/otb/Modules/IO/IOGDAL/src/otbGDALImageIO.cxx:1406: itk::ERROR: GDALImageIO(0x4695e080): Free disk space available is 2427334656 bytes, whereas 29595009024 are at least necessary. You can disable this check by defining the CHECK_DISK_FREE_SPACE configuration option to FALSE.`

Jan 19 '22 00:01 ankshah131

Hi @ankshah131

The batch size reduces the memory footprint during training, not inference
You can adjust the memory footprint during inference using the --pad and --ts parameters. Try with --pad 32 --ts 128 which is the minimum you can do, however note that it slows down the processing.
Regarding the other error messages (Free disk space available is 2427334656 bytes, whereas 29595009024 are at least necessary), you need to mount storage to your docker with -v options (see docker use doc in otbtf, or docker documentation).

Finally, I would advise strongly to use a modern hardware, equipped with at least 8Gb RAM GPU, and the otbtf-gpu docker image. The super resolution network is quite greedy in convolutions, and running it on CPU is not an option in operational situation. You can train a smaller network reducing the depth and the number of residual blocks, but that is another piece of information

Jan 19 '22 08:01 remicres

Hi Remi! Thank you so much for the tips. I adjusted ts to 256 and pad to 32 and it worked although, as you said, it was very slow. I will definitely use modern hardware with a good GPU and the otbtf-gpu docker image. Really appreciate your work and thank you for sharing it as an open source contribution.

Jan 20 '22 13:01 ankshah131

sr4rs sr4rs copied to clipboard

Tensorflow Issue 'Illegal Instruction'

sr4rs
sr4rs copied to clipboard