ros-bridge icon indicating copy to clipboard operation
ros-bridge copied to clipboard

Sync carla actor creation

Open berndgassmann opened this issue 2 years ago • 16 comments

Wait one tick before creating carla actors to ensure the transforms of the newly spawned actors have been updated by the server.

closes #565


This change is Reviewable

berndgassmann avatar Sep 22 '21 15:09 berndgassmann

Hi @berndgassmann , I've tested this pr with roslaunch command. However, sometimes there are exception messages that prevent the vehicle from being generated properly. Maybe it has something to do with the order in which the objects are spawned?

roslaunch carla_ros_bridge carla_ros_bridge_with_example_ego_vehicle.launch

[INFO] [1632376143.101234, 2.166313]: Created EgoVehicle(id=1467) [INFO] [1632376143.102425, 2.166313]: Object (type='actor.pseudo.control', id='control') spawned successfully as 10005. [INFO] [1632376143.102654, 2.166313]: Delaying task on actor 1468 to next tick [INFO] [1632376143.103466, 2.166313]: Delaying task on actor 1469 to next tick Exception in thread Thread-7: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/kevinlad/carla1s/carla1s_dev_ws/src/carla1s-ros/ros-bridge/carla_ros_bridge/src/carla_ros_bridge/bridge.py", line 273, in _synchronous_mode_update self.actor_factory.update_available_objects(world_snapshot.timestamp) [INFO] [1632376143.104582, 2.166313]: Object (type='sensor.lidar.ray_cast', id='lidar') spawned successfully as 1470. File "/home/kevinlad/carla1s/carla1s_dev_ws/src/carla1s-ros/ros-bridge/carla_ros_bridge/src/carla_ros_bridge/actor_factory.py", line 135, in update_available_objects self._create_object(pseudo_object[0], pseudo_object[1].type, pseudo_object[1].id, File "/home/kevinlad/carla1s/carla1s_dev_ws/src/carla1s-ros/ros-bridge/carla_ros_bridge/src/carla_ros_bridge/actor_factory.py", line 292, in _create_object raise IndexError("Parent object {} not found".format(attach_to)) IndexError: Parent object 1469 not found [INFO] [1632376143.106144, 2.166313]: Object (type='sensor.lidar.ray_cast_semantic', id='semantic_lidar') spawned successfully as 1471.

KevinLADLee avatar Sep 23 '21 05:09 KevinLADLee

ok, seems like the pseudo object in your case requires the real actor that is postponed to the next tick. So we should also postpone those... I thought it might be sufficient to postpone the actual carla actors. I'll change that and provide an update here, then all commands are executed in the next tick. Thanks for testing and bringing this up.

berndgassmann avatar Sep 23 '21 07:09 berndgassmann

Sometimes I still get the wrong transformation message, especially for rgb_front and rgb_view. I think it may also be caused by the fact that the sensor position is still (0, 0, 0) when the following function is used to get the sensor position.

relative_transform = trans.carla_transform_to_ros_pose(carla_actor.get_transform())

During my testing, I added one more frame delay here and the problem did not recur. I don't know if you are having the same problem, maybe this is due to differences in hardware performance? Thanks a lot for your replies.

                if timestamp and task[2] and task[2].frame+1 >= timestamp.frame:
                    task_queue.put(task)

KevinLADLee avatar Sep 23 '21 13:09 KevinLADLee

@berndgassmann @KevinLADLee I've been testing this PR and this solution may still lead to the same behavior as before. Most of the times is working as expected but there is a race condition that reproduces the same issue as before (actually, I've only reproduced the problem once after several tests).

The problem is that the update cycle and the spawning service run in different threads. It may be the case where we tick the CARLA server right after getting the timestamp (https://github.com/carla-simulator/ros-bridge/blob/sync_carla_actor_creation/carla_ros_bridge/src/carla_ros_bridge/actor_factory.py#L167) but before the actual spawning of the actor (https://github.com/carla-simulator/ros-bridge/blob/sync_carla_actor_creation/carla_ros_bridge/src/carla_ros_bridge/actor_factory.py#L177). We do have a lock but it only prevents spawning actors while generating the objects.

Also, we need to check that this change doesn't break the asynchronous mode.

joel-mb avatar Sep 23 '21 13:09 joel-mb

Sometimes I still get the wrong transformation message, especially for rgb_front and rgb_view. I think it may also be caused by the fact that the sensor position is still (0, 0, 0) when the following function is used to get the sensor position.

relative_transform = trans.carla_transform_to_ros_pose(carla_actor.get_transform())

During my testing, I added one more frame delay here and the problem did not recur. I don't know if you are having the same problem, maybe this is due to differences in hardware performance? Thanks a lot for your replies.

                if timestamp and task[2] and task[2].frame+1 >= timestamp.frame:
                    task_queue.put(task)

@KevinLADLee Yeah, I think this can be happening due to that race condition. In principle, once you spawn the actor you only need one tick to reliable get all the data. So it shouldn't be necessary to postpone the creation of the object two ticks.

joel-mb avatar Sep 23 '21 13:09 joel-mb

Yes, I encountered the same. Now it should be robust (at least time now starting 15 times one after another all worked). I also added some notes on why this is required. Hope now all works reliably.

berndgassmann avatar Sep 23 '21 14:09 berndgassmann

@joel-mb I added the frame information also to async mode, even though there the 2 frame might sill not be enough since the server can rush in the meantime and multiple frames theoretically can already be in the processing queue inside the client. But I don't see any chance at the moment to access this data via python. Even in C++ I don't know if it's easily possible of querying the queue size of the boost::asio::post on the current context. So the client just has to be fast enough... but for async mode that information might be useful for the client tough (so it could drop calls when for some reason the queue size is getting bigger and bigger). If I know there are already 2 other callbacks with newer data waiting for execution, I could skip the current one.

berndgassmann avatar Sep 23 '21 15:09 berndgassmann

Hmm, maybe we can also deploy the actor.is_alive flag to get rid of this problem (for both). At least if it's getting valid if data has been received via rpc from server, that should then also work.

berndgassmann avatar Sep 23 '21 15:09 berndgassmann

Alive flag is already True directly after creation at client side ... but actually the actor is not yet fully alive after creation. At least this could be an option of how this might be solved in future versions; the actor on client side is kept in state e.g. ActorState::PendingCreation until the server had actually created it and sent back a message with ActorState::Active state. Then one would be able to differentiate between those locally at client side. For now looks as if we have to stick in async mode with this solution, I guess.

berndgassmann avatar Sep 23 '21 16:09 berndgassmann

@KevinLADLee your fix is working thank you! but ive noticed that the ground trouth markers of ego and adversary vehicles are located below the ground grid and do not match the point cloud anymore. Is this problem related to your fix?

image

image

Roosstef avatar Sep 28 '21 11:09 Roosstef

@Roosstef I've just checked the markers without this fix. And they look the same (below the streetlevel). Therefore, it's a separate issue.

berndgassmann avatar Sep 29 '21 10:09 berndgassmann

@joel-mb can we then merge this one, as it's definitely solving the issue with current CALRA version.

berndgassmann avatar Oct 01 '21 15:10 berndgassmann

@joel-mb can we then merge this one, as it's definitely solving the issue with current CALRA version.

@berndgassmann Sorry for the delay, This PR is definitely fixing the issue but I'm not sure if delaying the creation of actors in the ROS bridge is the best solution. For instance, for the integration of the ROS-bridge with the leadeboard this will require doing extra ticks during the initialization of the stack. I have proposed another solution in #576 . Let me know if it also works for your use cases.

Basically, the root of the problem is a bug in CARLA. Right now, when you spawn an actor in CARLA and you try to get its information with the same client without doing a tick you get wrong information. This bug will be fixed for the next release. Actors spawned by other clients (e.g., manual control, scenario runner, ...) don't have this problem because for the ROS bridge to be aware of them a tick is necessary.

Probably we will need to revisit this part of the code for the next release, when fixed this bug in CARLA.

joel-mb avatar Oct 01 '21 15:10 joel-mb

PR #576 also solves the issue without delaying. So we might want to go with Joel's one.

berndgassmann avatar Oct 06 '21 11:10 berndgassmann

@berndgassmann Thanks for testing the PR. #576 has already been merged. I'm closing this one.

joel-mb avatar Oct 06 '21 13:10 joel-mb

Seems like this PR is still relevant, as with larger maps the #576 still doesn't fix the issue.

berndgassmann avatar Dec 19 '22 15:12 berndgassmann