TCP icon indicating copy to clipboard operation
TCP copied to clipboard

question about traffic light

Open EcustBoy opened this issue 2 years ago • 7 comments

Hi~ author, In my opinion, TCP model directly use raw image and some measurement signal as input, and doesn't consider intermediate perception results. But how does it learn traffic light information? If only rely on expert trajectory samples to train, I think the traffic light is too small in front view such that it's actually hard to learn "red-stop, green-start" behavior?

Besides, does training dataset size has crucial impact on the final performance of understanding traffic light? Whether there are relevant ablation experiments about this?

EcustBoy avatar May 31 '23 13:05 EcustBoy

Yes, it learns the "red-stop, green-start" from the expert demonstrations. And I think the current camera setup could capture the traffic light information. But you can also try to add another camera with an explicit traffic light detection module to enhance its ability similar to LAV.

Most of the training routes contain junctions with traffic lights, so the traffic light related data is abundant. I think the dataset size is important to learn rules about the traffic light, but we do not have such ablations.

penghao-wu avatar May 31 '23 18:05 penghao-wu

Yes, it learns the "red-stop, green-start" from the expert demonstrations. And I think the current camera setup could capture the traffic light information. But you can also try to add another camera with an explicit traffic light detection module to enhance its ability similar to LAV.

Most of the training routes contain junctions with traffic lights, so the traffic light related data is abundant. I think the dataset size is important to learn rules about the traffic light, but we do not have such ablations.

Thanks for your reply, right now I only train on my own small dataset (about 75K samples) and I haven't feed image to planner decoder directly, I think this is the main reason where my model can't learn to understanding traffic light. :-).

I'm gonna try to design similar front view feature extraction network similar to TCP, it seems that ego car is able to learn the "red-stop, green-start" behavior as long as I feed raw image to simple network and train on a relatively big dataset, instead of some complicated design, right? many thanks for your answer~

EcustBoy avatar May 31 '23 18:05 EcustBoy

So currently what is the input to your planner decoder if you do not feed the image features to it?

penghao-wu avatar May 31 '23 19:05 penghao-wu

So currently what is the input to your planner decoder if you do not feed the image features to it?

actually I input (1)other cars and map detection embedding feature which are output by the front backbone and detection head, and (2)some ego car state(including command waypoint and speed), So I think I shouldn't only use the intermediate feature, it seems the raw image is also needed.

EcustBoy avatar May 31 '23 20:05 EcustBoy

Yes, you need to include information containing traffic light information (like raw images or traffic light detection results) as input.

penghao-wu avatar May 31 '23 23:05 penghao-wu

Yes, you need to include information containing traffic light information (like raw images or traffic light detection results) as input.

Hi~author, I read your code again and notice you use pretrained resnet34 to get image feature.

I wanna ask is a pretrained image feature network backbone necessary if I only wanna get traffic light info from front-view? For limit the network size, perhaps a shallow custom-designed network is already enough? Not sure whether you‘ve made such comparison~

EcustBoy avatar Jun 01 '23 04:06 EcustBoy

I think a shallow network would suffice if you have direct supervision on the traffic light states.

penghao-wu avatar Jun 25 '23 06:06 penghao-wu