cars icon indicating copy to clipboard operation
cars copied to clipboard

How to debug CARS if things don't work well

Open lionlai1989 opened this issue 2 years ago • 6 comments

Hi:
This project is awesome. I've encountered some issues and hope you can help.

  1. I am running CARS on a pair of Pleiades stereo imagery and the result is not very optimal. Here is CARS result:
    Screenshot from 2022-08-12 13-51-55

  2. In the mean time, I used another open source software to run the same Pleiades data and the result is as follow:
    Screenshot from 2022-08-12 13-50-25

  3. Basically, I would like to just use one software and stick to it. And I would like to understand that in this situation, what I can do to debug CARS. I am thinking checking intermediate files since CARS processes a raster by tiles. And I didn't do bundle adjustment before running. Would it help if bundle adjustment is run beforehand?

  4. I am expecting to see some intermediate files created during the processing. For example, since a raster is first split into smaller tiles, is it possible to visualize the tiled raster and see what's going on, or check the SIFT matching points?
    Thank you.
    Here are the trackback:

2022-08-12 10:27:18,991 - distributed.nanny - WARNING - Worker process still alive after 3.9999990463256836 seconds, killing
2022-08-12 10:27:19,043 - distributed.nanny - WARNING - Worker process still alive after 3.999998474121094 seconds, killing
2022-08-12 10:27:19,043 - distributed.nanny - WARNING - Worker process still alive after 3.9999982833862306 seconds, killing
2022-08-12 10:27:19,044 - distributed.nanny - WARNING - Worker process still alive after 3.999998474121094 seconds, killing
2022-08-12 10:27:19,044 - distributed.nanny - WARNING - Worker process still alive after 3.999998474121094 seconds, killing
2022-08-12 10:27:19,044 - distributed.nanny - WARNING - Worker process still alive after 3.999998664855957 seconds, killing
2022-08-12 10:27:19,044 - distributed.nanny - WARNING - Worker process still alive after 3.9999988555908206 seconds, killing
2022-08-12 10:27:19,095 - distributed.nanny - WARNING - Worker process still alive after 3.9999990463256836 seconds, killing
2022-08-12 10:27:19,095 - distributed.nanny - WARNING - Worker process still alive after 3.999998664855957 seconds, killing
2022-08-12 10:27:19,096 - distributed.nanny - WARNING - Worker process still alive after 3.999998664855957 seconds, killing
2022-08-12 10:27:19,096 - distributed.nanny - WARNING - Worker process still alive after 3.9999988555908206 seconds, killing
2022-08-12 10:27:19,149 - distributed.nanny - WARNING - Worker process still alive after 3.999998474121094 seconds, killing
2022-08-12 10:27:19,149 - distributed.nanny - WARNING - Worker process still alive after 3.999998474121094 seconds, killing
2022-08-12 10:27:20,542 - tornado.application - ERROR - Exception in callback functools.partial(<bound method AsyncProcess._on_exit o
f <AsyncProcess Dask Worker process (from Nanny)>>, -15)                                                                             
Traceback (most recent call last):                                                                                                   
  File "/cars/venv/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback                                        
    ret = callback()                                                                                                                 
  File "/cars/venv/lib/python3.8/site-packages/distributed/process.py", line 139, in _on_exit                                        
    self._exit_callback(self)                                                                                                        
  File "/cars/venv/lib/python3.8/site-packages/distributed/nanny.py", line 695, in _on_exit                                          
    self.mark_stopped()                                                                                                              
  File "/cars/venv/lib/python3.8/site-packages/distributed/nanny.py", line 733, in mark_stopped                                      
    self.on_exit(r)                                                                                                                  
  File "/cars/venv/lib/python3.8/site-packages/distributed/nanny.py", line 501, in _on_exit_sync                                     
    self._ongoing_background_tasks.call_soon(self._on_exit, exitcode)                                                                
  File "/cars/venv/lib/python3.8/site-packages/distributed/core.py", line 190, in call_soon                                          
    raise AsyncTaskGroupClosedError(                                                                                                 
distributed.core.AsyncTaskGroupClosedError: Cannot schedule a new coroutine function as the group is already closed.                 
2022-08-12 10:27:20,543 - tornado.application - ERROR - Exception in callback functools.partial(<bound method AsyncProcess._on_exit o
f <AsyncProcess Dask Worker process (from Nanny)>>, -15)                                                                             
Traceback (most recent call last):                                                                                                   
  File "/cars/venv/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback                                        
    ret = callback()                                                                                                                 
  File "/cars/venv/lib/python3.8/site-packages/distributed/process.py", line 139, in _on_exit                                        
    self._exit_callback(self)                                                                                                        
  File "/cars/venv/lib/python3.8/site-packages/distributed/nanny.py", line 695, in _on_exit                                          
    self.mark_stopped()                                                                                                              
  File "/cars/venv/lib/python3.8/site-packages/distributed/nanny.py", line 733, in mark_stopped                                      
    self.on_exit(r)                                                                                                                  
  File "/cars/venv/lib/python3.8/site-packages/distributed/nanny.py", line 501, in _on_exit_sync                                     
    self._ongoing_background_tasks.call_soon(self._on_exit, exitcode)                                                                
  File "/cars/venv/lib/python3.8/site-packages/distributed/core.py", line 190, in call_soon
    raise AsyncTaskGroupClosedError(
distributed.core.AsyncTaskGroupClosedError: Cannot schedule a new coroutine function as the group is already closed.

And in compute_dsm_outdir/22-08-12_10h00m_compute_dsm.log,

22-08-12 10:27:14 :: ERROR :: CARS terminated with following error
Traceback (most recent call last):
  File "/cars/cars/cars.py", line 748, in main_cli
    run_compute_dsm(args, dry_run)
  File "/cars/cars/cars.py", line 680, in run_compute_dsm
    compute_dsm.run(
  File "/cars/cars/pipelines/compute_dsm.py", line 1172, in run
    write_dsm.write_geotiff_dsm(
  File "/cars/cars/pipelines/write_dsm.py", line 324, in write_geotiff_dsm
    for future, raster_tile in tqdm(
  File "/cars/venv/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/cars/venv/lib/python3.8/site-packages/distributed/client.py", line 4883, in __next__
    return self._get_and_raise()
  File "/cars/venv/lib/python3.8/site-packages/distributed/client.py", line 4872, in _get_and_raise
    raise exc.with_traceback(tb)
distributed.scheduler.KilledWorker: ('images_pair_to_3d_points-13efe864-f5ec-4d74-a056-0964cffb9b57', <WorkerState 'tcp://127.0.0.1:38559', name: 40, status: closed, memory: 0, processing: 11>)

lionlai1989 avatar Aug 12 '22 12:08 lionlai1989

Hi @lionlai1989,

I think your issue is due to instabilities in dask mode. I would recommend you using multiprocessing mode of compute_dsm pipeline :

cars compute_dsm --mode mp --nb_workers 4

Also, we will present CARS 0.5.0 (Be careful, this version changes API and configuration) at FOSS4G workshop soon with information you might be interested in : https://talks.osgeo.org/foss4g-2022-workshops/talk/SFMBCR/. The materials will be available on the github.

We will keep you informed and you can also send me an e-mail for more information : [email protected]. Regards, David

dyoussef avatar Aug 17 '22 07:08 dyoussef

Hey @dyoussef : Thanks for your reply. I am running CARS with the new API, but I found out that the new cli doesn't support either --mode mp or --nb_workers 4 arguments. I am wondering how users could provide arguments in new API? The error:

usage: cars [-h] [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--version]
            conf
cars: error: unrecognized arguments: --nb_workers --mode mp /data/cars_duesseldorf_config.json

And so I run the following command and config without any arguments: docker run -w /data -v /home/ubuntu/data:/data cnes/cars /data/cars_duessel dorf_config.json

{
    "inputs": {
        "sensors": {
            "one": {
                "image": "Duesseldorf/IMG_PHR1A_P_001/IMG_PHR1A_P_201807271102180_SEN_3385144101-1_R1C1.JP2",
                "geomodel": "Duesseldorf/IMG_PHR1A_P_001/RPC_PHR1A_P_201807271102180_SEN_3385144101-1.XML",
                "no_data": 0
            },
            "two": {
                "image": "Duesseldorf/IMG_PHR1A_P_002/IMG_PHR1A_P_201807271101330_SEN_3385144101-1_R1C1.JP2",
                "geomodel": "Duesseldorf/IMG_PHR1A_P_002/RPC_PHR1A_P_201807271101330_SEN_3385144101-1.XML",
                "no_data": 0
            }
        },
        "pairing": [
            [
                "one",
                "two"
            ]
        ],
        "initial_elevation": "srtm_dir"
    },
    "output": {
        "out_dir": "outresults"
    }
}

Thank you.

lionlai1989 avatar Aug 24 '22 06:08 lionlai1989

Hi @lionlai1989, with the new API, you have to provide all the arguments through the config.json file

{
    "inputs": {
        "sensors": {
            "one": {
                "image": "Duesseldorf/IMG_PHR1A_P_001/IMG_PHR1A_P_201807271102180_SEN_3385144101-1_R1C1.JP2",
                "geomodel": "Duesseldorf/IMG_PHR1A_P_001/RPC_PHR1A_P_201807271102180_SEN_3385144101-1.XML",
                "no_data": 0
            },
            "two": {
                "image": "Duesseldorf/IMG_PHR1A_P_002/IMG_PHR1A_P_201807271101330_SEN_3385144101-1_R1C1.JP2",
                "geomodel": "Duesseldorf/IMG_PHR1A_P_002/RPC_PHR1A_P_201807271101330_SEN_3385144101-1.XML",
                "no_data": 0
            }
        },
        "pairing": [
            [
                "one",
                "two"
            ]
        ],
        "initial_elevation": "srtm_dir"
    },
    "output": {
        "out_dir": "outresults"
    },
    "orchestrator": {
        "mode": "mp",
        "nb_workers": 4
    }
}

(cf. https://cars.readthedocs.io/en/latest/user_guide/configuration.html#orchestrator)

Regards, David

dyoussef avatar Aug 24 '22 07:08 dyoussef

Hi @dyoussef I've tried your advice but there is another error and my program just hangs there without running (observed by htop). So eventually I have to use CTRL+C to terminate the program. Here are the step by step outputs:

  1. Run the program and it shows the following error.
$ docker run -w /data -v /home/ubuntu/cars/data:/data cnes/cars /data/cars_duesseldorf_confi
g.json                                                                                                                               
Computing epipolar grids ...: 100% [**************************************************] (5s)                                         
                                                                                                                                     
Processing Tiles : [ epi_matches_left ] ...:   0%|          | 0/1886 [00:00<?, ?it/s]2022-08-24 07:31:47 (WARNING) EpipolarSparseMatc
hing: Warning: tile ignored because of the following error                                                                           itk::ExceptionObject (0x7f989c5b3a80)                                                                                                
Location: "unknown"                                                                                                                  
File: /usr/include/ITK-4.13/itkEuclideanDistanceMetric.hxx                                                                           
Line: 64                                                                                                                             
Description: itk::ERROR: EuclideanDistanceMetric(0x7f989c0b7cc0): The two measurement vectors have unequal size (128 and 2618117160) 
                                                              
2022-08-24 07:51:42 (WARNING) EpipolarSparseMatching: Warning: tile ignored because of the following error                           
itk::ExceptionObject (0x7f989c40c2b0)                                                                                                
Location: "unknown"                                                                                                                  
File: /usr/include/ITK-4.13/itkEuclideanDistanceMetric.hxx                                                                           
Line: 64                                                                                                                             
Description: itk::ERROR: EuclideanDistanceMetric(0x7f989c637a30): The two measurement vectors have unequal size (128 and 1181730355) 
                                                                                                                                     
Processing Tiles : [ epi_matches_left ] ...: 100%|██████████| 1886/1886 [29:28<00:00,  1.07it/s]                                     
Processing Tiles : [ color , dsm ] ...:   0%|          | 0/420 [00:00<?, ?it/s]Terminating: fork() called from a process already usin
g GNU OpenMP, this is unsafe.                                                                                                        
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.                                                  
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.     

The program hangs at this point. I observe from htop that no CPU and memory resource is being used.

  1. Type CTRL+C. The output:
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
^C <-- (CTRL+C IS HERE)
22-08-24 13:45:05 :: ERROR :: CARS terminated with following error
Traceback (most recent call last):
  File "/cars/venv/lib/python3.8/site-packages/cars/cars.py", line 181, in main_cli
    used_pipeline.run()
  File "/cars/venv/lib/python3.8/site-packages/cars/pipelines/sensor_to_full_resolution_dsm/sensor_to_full_resolution_dsm_pipeline.py", line 558, in run
    _ = self.rasterization_application.run(
  File "/cars/venv/lib/python3.8/site-packages/cars/orchestrator/orchestrator.py", line 314, in __exit__
    self.compute_futures()
  File "/cars/venv/lib/python3.8/site-packages/cars/orchestrator/orchestrator.py", line 253, in compute_futures
    for future_obj in self.cluster.future_iterator(future_objects):
  File "/cars/venv/lib/python3.8/site-packages/cars/orchestrator/cluster/multiprocessing_cluster.py", line 694, in __next__
    if item.ready():
KeyboardInterrupt
Processing Tiles : [ color , dsm ] ...:   0%|          | 0/420 [5:52:10<?, ?it/s]

Do you know what might be the problem? Thank you.

The config file:

{
    "inputs": {
        "sensors": {
            "one": {
                "image": "Duesseldorf/IMG_PHR1A_P_001/IMG_PHR1A_P_201807271102180_SEN_3385144101-1_R1C1.JP2",
                "geomodel": "Duesseldorf/IMG_PHR1A_P_001/RPC_PHR1A_P_201807271102180_SEN_3385144101-1.XML",
                "no_data": 0
            },
            "two": {
                "image": "Duesseldorf/IMG_PHR1A_P_002/IMG_PHR1A_P_201807271101330_SEN_3385144101-1_R1C1.JP2",
                "geomodel": "Duesseldorf/IMG_PHR1A_P_002/RPC_PHR1A_P_201807271101330_SEN_3385144101-1.XML",
                "no_data": 0
            }
        },
        "pairing": [
            [
                "one",
                "two"
            ]
        ],
        "initial_elevation": "srtm_dir"
    },
    "output": {
        "out_dir": "outresults"
    },
    "orchestrator": {
        "mode": "mp",
        "nb_workers": 16
    }
}

The spec of my laptop:

  • 16 CPUs
  • 64 GB memory

Docker image: The latest 0.5.0

lionlai1989 avatar Aug 24 '22 14:08 lionlai1989

Hi @lionlai1989,

Most of the people on the team are on vacation or at FOSS4G this week (me included), so we will analyse your issue next week.

We will keep you informed. Regards, David

dyoussef avatar Aug 24 '22 15:08 dyoussef

Hi @lionlai1989,

https://github.com/CNES/cars/commit/5dd71e60fa8b0006b2a09ce11edc101799be3982 should fix the problem. Can you "docker pull" cnes/cars:0.5.1 and try again ?

Thanks in advance, David

dyoussef avatar Oct 01 '22 12:10 dyoussef

Hi David: I've tried it again. The result looks great. Thanks a lot. Screenshot from 2022-10-06 13-39-33

lionlai1989 avatar Oct 06 '22 11:10 lionlai1989