pytorch-auto-drive
pytorch-auto-drive copied to clipboard
Roadmap
Roadmap for our users (to state feature requests) and contributors.
* Low priority tasks.
2022Q? (2022.4.2 - ):
- Lane detection methods
- [ ] *ConvNeXt
- [ ] Add and test VGG16-RESA
- [x] Explore ERFNet-RESA #74
- [ ] Add and test ERFNet-LSTR
- [x] LaneATT #90 #102
- [ ] Explore CondLaneNet
- [ ] *Explore FOLOLanes
- Semantic segmentation methods
- [ ] *MobileNetV2
- [ ] *MobileNetV3
- [ ] *RepVGG
- [ ] *Swin Transformer V1
- [ ] *ConvNeXt
- [ ] *Add ERFNet training from scratch
- Datasets and pre-processings
- [ ] *Support CurveLanes
- [ ] *Support Comma10K
- Visualization
- [x] Comparison with GT in lane detection #72
- Framework
- [ ] *ONNX inference support
- [ ] *TensorRT inference support
- [ ] *SCNN TensorRT conversion
- [ ] Semi-real profiling benchmark
- [ ] *Get rid of mmcv dependency #65 #94 #97
- Documentation
- [ ] More advanced tutorials #72
- [ ] Per-model docs
2022Q1 (2022.1.10 - 2022.3.31):
- Lane detection methods
- [x] BezierLaneNet #60
- [x] MobileNetV2 #53
- [x] MobileNetV3 #53
- [x] RepVGGs #54
- [x] Swin Transformer V1 #56
- [ ] *ConvNeXt
- [ ] Add and test VGG16-RESA
- [ ] Explore ERFNet-RESA
- [ ] Add and test ERFNet-LSTR
- [ ] LaneATT
- [ ] Explore CondLaneNet
- [ ] Explore FOLOLanes
- Semantic segmentation methods
- [ ] *MobileNetV2
- [ ] *MobileNetV3
- [ ] *RepVGG
- [ ] *Swin Transformer V1
- [ ] *ConvNeXt
- [ ] *Add ERFNet training from scratch
- Datasets and pre-processings
- [ ] *Support CurveLanes
- [x] Cherry-pick keypoint affine transform from private branch
- [ ] Support Comma10K
- Visualization
- [ ] Comparison with GT in lane detection
- Framework
- [x] *Merge private branch
- [ ] *ONNX inference support
- [ ] *TensorRT inference support
- [ ] *SCNN TensorRT conversion
- [ ] Semi-real profiling benchmark
- [ ] *Get rid of mmcv dependency #65
- Documentation
- [ ] More advanced tutorials
- [ ] Per-model docs
2021Q4 (2021.10.1 - 2021.12.31):
- Lane detection methods
- [ ] Add and test VGG16-RESA (moved from Q3)
- [x] Test RESA (ResNets) (moved from Q3) #27 #31
- [ ] Explore ERFNet-RESA (moved from Q3)
- [x] Add and test ResNet34-LSTR (moved from Q3) #29
- [ ] Add and test ERFNet-LSTR (moved from Q3)
- [ ] Explore CondLaneNet
- [ ] Explore FOLOLanes
- Semantic segmentation methods
- [ ] *Add ERFNet training from scratch (moved from Q3)
- Datasets and pre-processings
- [ ] *Support CurveLanes (moved from Q3)
- [ ] Cherry-pick keypoint affine transform from private branch
- [ ] Support Comma10K
- Visualization
- [ ] Comparison with GT in lane detection
- [x] *More inference-free visualizations (folder, etc.) #48 #45
- Framework
- [x] Refactor with configs #45
- [ ] *Merge private branch
- [x] requirements.txt #37 ~~- [ ] *Replace thop with fvcore ~~
- [x] *Torch -> ONNX #43
- [x] *ONNX -> TensorRT #47
- [ ] *ONNX inference support
- [ ] *TensorRT inference support
- [ ] *SCNN TensorRT conversion
2021Q3 (2021.7.1 - 2021.9.30):
- Datasets and pre-processings
- [x] *Investigate a possible shared memory leak from padding mask, Python native List or transforms in keypoint datasets
- Lane detection methods
- [x] Add RESA (ResNets) #22
- [ ] Add and test VGG16-RESA
- [ ] Test RESA (ResNets)
- [ ] Explore ERFNet-RESA
- [ ] Add and test ResNet34-LSTR
- [ ] Add and test ERFNet-LSTR
- [x] Test LSTR with simple data augmentation on TuSimple 844ebd7
- [x] Test LSTR on CULane 0c5bcc5
- [x] Test ResNet34, ERFNet with strong data augmentation on TuSimple 721fc26
- Semantic segmentation methods
- [ ] *Add ERFNet training from scratch
- Datasets and pre-processings
- [ ] *Support BDD100K
- [ ] *Support CurveLanes
- Visualization (#7)
- [x] Support demo with video input for lane detection #24
- [x] Support demo with video input for semantic segmentation #23
- [x] Support demo with image dir input for lane detection #24
- [x] Support demo with image dir input for semantic segmentation #23
- [ ] *Support demo with camera input for lane detection
- [ ] *Support demo with camera input for semantic segmentation
- Framework
- [x] Support multi-GPU training with Torch DDP 6a31436
- [x] Support lower PyTorch/CUDA/CuDNN versions #25
- [x] Support PyTorch cross-version loading solution
2021Q2 (2021.4.1 - 2021.6.30):
- Lane detection methods
- [ ] Add RESA (VGG16, ResNets)
- [ ] Explore ERFNet-RESA
- [x] Add ResNet18-LSTR #11 57c7acc #13 #18 #19 #20 c260e6a
- [ ] Add ResNet34-LSTR/ERFNet-LSTR ~~Add ERFNet-PRNet~~ Awaiting more info ~~*Add ENet-SAD~~ Unable to re-implement ~~*Explore ERFNet-SAD~~ Unable to re-implement
- Semantic segmentation methods
- [ ] *Add ERFNet training from scratch
- Datasets and pre-processings
- [ ] *Support general affine transforms for keypoints
- [ ] *Support BDD100K
- [x] Support LLAMAS #15
- Visualization (#7)
- [ ] *Support demo with video input for lane detection
- [ ] *Support demo with video input for semantic segmentation
- [ ] *Support demo with camera input for lane detection
- [ ] *Support demo with camera input for semantic segmentation
- Benchmark
- [x] Investigate the "ENet slower than ERFNet" problem
- [x] Count fps/flops/mem for transformer-based method LSTR 82d2c5d
- Documentation
- [x] *Explanations/Descriptions for re-implemented methods, especially the improved parts
- Framework
- [ ] *Support multi-GPU training
2021Q1 (-2021.3.31):
- Lane detection methods
- [x] Add ResNet backbones cea2ce8 ~~Add RESA (VGG16, ResNets)~~ ~~Explore ERFNet-RESA~~ ~~Add ResNet18-LSTR~~ ~~Try add a LSTR that is directly comparable with other methods, i.e. on a common backbone~~ ~~Add ERFNet-PRNet~~
- [x] Add ENet Baseline ed3f739 ~~Add ENet-SAD~~ ~~*Explore ERFNet-SAD~~
- Semantic segmentation methods
- [x] Add and test ENet on Cityscapes ee3444b ~~*Add ERFNet training from scratch~~
- [x] Add --workers option 89d3695
- Datasets and pre-processings
- [x] Support TuSimple and CULane loading as keypoints & keypoint transforms (Rotation, Resize) #5, 839d096 ~~Support BDD100K~~ ~~*Support LLAMAS~~
- Visualization
- [x] Add lane markers visualization toolkit for images 6b29f02 (only support visualization from files )
- [x] Organize segmentation result visualization toolkit for images 5ed43d9
- Benchmark
- [x] *Explore FPS tests 93b2a21
- [x] *Try provide FLOPs and memory counts for implemented methods 93b2a21
- Documentation
- [x] Better guide for downloading and preparing datasets (partly addressed by #8)
- [x] *Guide for visualization toolkits c680c90
The maintainers have rather low bandwidth these days, about half features for Q1 remain unfinished and are pushed to Q2. Any help would be much appreciated!
I am interested in helping to support this library. I am in a position to add support for comma10k as well as GAN-based weather augmentations soon. I have previously trained this on BDD100K: https://github.com/hustvl/YOLOP
I really do not recommend this architecture because despite seeming very attractive, being very flexible in training, and learning all the tasks well, I could not convert it to TensorRT. I opened issues both there and in the tensorrtx repository but the HKUST students did not help me at all, and many files were missing from their repository.
See here:
https://github.com/hustvl/YOLOP/issues/12 https://github.com/wang-xinyu/tensorrtx/issues/793
I would also like to move these two papers into the Roadmap:
https://arxiv.org/abs/2105.05003 CondLaneNet https://arxiv.org/abs/2105.13680 FOLOLane
I am a MS in AI Candidate at Boston University (taking semester off because courses are useless), I see that you are a student at SJTU @voldemortX I am a huge fan of your university and have read many papers from there. I regard them as the #1 world leader's in many computer vision application fields particularly surveillance. I would enjoy working with you.
Right now I have trained DDRNet for real-time semantic segmentation on comma10k dataset. I think comma10k dataset has huge use for the community because it is fully permissive so we can augment it with labels it is missing / new formats etc. I will update when I can submit some code, I will release it slightly after I build it into my pipeline to give my company a slight edge before I make it open-source. I do not have much experience with pull requests but I will do my best.
@SikandAlex It will be an honor to have your help as well!
I am interested in helping to support this library. I am in a position to add support for comma10k as well as GAN-based weather augmentations soon. I have previously trained this on BDD100K: https://github.com/hustvl/YOLOP
Any new supports on datasets is welcomed!
I would also like to move these two papers into the Roadmap:
https://arxiv.org/abs/2105.05003 CondLaneNet https://arxiv.org/abs/2105.13680 FOLOLane
The CondLaneNet is open-sourced and could be easier to implement. While FOLOLane might prove a harder method that need more work, since we do not yet have one of its backbones (BiSeNet).
I'll add them in the Roadmap for Q4 and they can of course continue into 22. You can submit PRs whenever you have a ready-to-go bunch of codes (e.g. implemented one of the dataset class and tested its loading, or finished an algorithm). Thanks again for your help!
TensorRT support is also what @cedricgsh and I have talked about recently. We too agree that pytorch-auto-drive should not stop at a research codebase. Our primal aim would be a TensorRT benchmark for model speed and op-based FLOPs calculation from fvcore. But given our current bandwidth, I think that would need to wait until 22Q1 at least (a refactor of the framework might be required).
I have an AGX Xavier on hand that I will hopefully be able to provide some benchmarks on for certain models. Unfortunately I'm no expert at TensorRT custom layers etc and some operations it seems are unable to be supported by many developers.
There is so many papers that claim to get certain FPS on deploy to embedded GPUs but they never release their code, they only release testing code and no training code, the results are not reproducible even if there is training code, so really it is a huge mess. I have spent the past 2 months trying to determine the best papers and best approach as of Fall 2021 given my computational limitations.
After reading as many papers as I can over the past 3 months, driving around in my friend's Tesla, seeing many other models (amazing AI research from Asia is just destroying us here in US in my opinion), there are not that many critical components to Level 2 highway autonomy (which should be the first step).
- Object Detection for Vehicles/Pedestrians
- YOLOX by Megvii (won AI City Challenge for Streaming Perception for Autonomous Driving 2021, lot of hype, good for edge)
- YOLOV5 by Ultralytics (great support and community, I have extensive experience)
- Yolo-FastestV2 https://github.com/dog-qiuqiu/Yolo-FastestV2 (super lightweight variant)
- Segmentation of Driveable Surface / Road
After much research I narrowed down the candidate models to the following:
- SFNet
- DDRNet (I successfully train model that does inference at 100+ FPS on RTX 2080 Ti)
- DF-Seg
- Attanet
- STDC
- Lane Detection (Segmentation with Post-Processing OR Polynomial/Keypoint/Row-Wise/Transformer/Non-Segmentation etc)
I have found that real-time segmentation approaches in regard to lane markings need to incorporate high-resolution features and not resize the input image before doing anything as many networks do. This is because the semantics of the lane lines beyond a certain distance from the vehicle are lost at lower resolution and then you can't predict the path far enough in advance. I also do not know how to post-process the segmentation based approach properly as I have not yet experimented with DBSCAN or RANSAC. From all my readings, this shows the most potential to me but it seems to require the 4-lane probability output rather than just the binary mask for lane segmentation I am currently producing:
https://github.com/czming/RONELD-Lane-Detection https://arxiv.org/abs/2010.09548
When I tested the models at https://github.com/Turoad/lanedet#Benchmark-and-model-zoo the CondLaneNet model performed the best which is why I recommended it be placed on the road map although it uses a heavy Resnet-101 backbone.
- Depth (Monocular/Stereo, ideally monocular)
- PyDNet
- MobileStereoNet
- MiDaS
- MonoDepth(2/Wavelet)
- FastDepth
- LapDepth
- HITNet
- 3D Object Detection (Maybe can replace Depth)
- FCOS3D
- FCOS3D++/PGD
- DD3D
- Bird's Eye View / Top View / Projection Transform
Finally, this is the best Github project that I've been able to find related to self-driving. All the models seem to come from OpenVINO / PINO model library.
https://github.com/iwatake2222/self-driving-ish_computer_vision_system
I'm not sure what this model is trained on, or what architecture, but it seems to work well? https://docs.openvino.ai/2018_R5/_docs_Transportation_segmentation_curbs_release1_caffe_desc_road_segmentation_adas_0001.html
On the control side of things, MPC solver is solution to use for latitude/longitude control.
This is the extensive information I have been able to collect. I automatically discarded models that didn't have code implementations but I think I could have made a mistake here or there. I was going to keep all this information to myself but all I do is code non-stop all day and have no life and still only make slow progress. My friends just work at company's like Facebook and Google and don't wan't to work on anything exciting. Impossible to get them to give up stock to work with me, also has a learning curve. So hopefully by giving back to the open-source community, they will give back to me and we can all improve science and also make money.
Well my knowledge about self-driving is kind of only in the research stage for now. And I certainly learned a lot from your comments. Though I'm still skeptical about deep learning's performance in actual applications of self-driving, especially on RGB inputs. So now what I do in SenseTime is more about human-centric applications.
Btw, we'll release our own lane detector later, perhaps early 22 at the latest. It end-to-end achieves a reasonable performance (~75 CULane, ~95 LLAMAS) at 150 FPS in PyTorch with a small model, which we believe could be beneficial to application. But it needs to remain in a private repo for now.
@SikandAlex We have now pushed initial supports for ONNX and TensorRT conversions, maybe they can be helpful to your applications? Refer to DEPLOY.md.