gluon-cv
gluon-cv copied to clipboard
[Discussion, Call for contribution] GluonCV 0.5 Roadmap
Hi,
I am opening a discussion thread for roadmap towards next release to everyone. Specifically, we are looking for:
- New features that are useful to your research and projects
- Improvements and patches to existing features
If you have any item that you'd like to propose to have in the roadmap, please do:
- Create (or locate existing) issue for the item, note the issue number.
- Comment in this issue: 1) the above issue number, 2) one sentence of what the item is about and why it's useful to you.
- Indication on whether you'd be willing to help out on the item.
Projects
- Video applications
- Action Recognition, phase I: High-perf Video Loader for training (https://github.com/zhreshold/decord)
- Public slides added to website(https://gluon-cv.mxnet.io/slides.html), and we are expecting more to be updated
- Better TVM integration (https://github.com/dmlc/tvm/pulls?utf8=%E2%9C%93&q=gluoncv)
Models
- Alphapose (reference: https://github.com/MVIG-SJTU/AlphaPose/tree/mxnet)
- OctConv replacement (https://arxiv.org/abs/1904.05049)
APIs
- Added API to visualize gluon networks #722
Scripts
- Fix pose estimation #703
Improvements
- Faster/Mask-RCNN FPN enhancement #700
I'll be collecting scattered bug fixes in release note later.
@dmlc/gluon-vision @pengzhao-intel @xinyu-intel @husonchen @kevinthesun @Laurawly @yidawang @ThomasDelteil @chinakook
Dropping a few things that I would love to have in Gluon CV, though I realize some might not be realistic for 0.5. They don't have issue associated with them because at this stage they are just ideas:
API:
- Migrating some GluonCV APIs to Gluon proper like batchify functions etc
- Having dataset download handled at the API level rather than script execution
Object Detection:
- RetinaNet for objection detection
- Fully convolutional object detector (no anchor boxes): https://github.com/tianzhi0549/FCOS
New Domains:
- Object detection and tracking: https://github.com/forschumi/siamfc-mxnet
- Deep Auto Encoder for defect detection
- Optical Character Recognition: https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet
The thing I'd like to raise here is the inconsistency across our APIs and scripts. Here are some examples:
- Example 1:
- Pre-trained classification models have a list of classes, but need additional code to get the top-5
- Pre-trained detection models have both the list and direct classification labels
- Example 2:
- Detection models have different output when training and test
- Segmentation models need to call
.demo()
for further visualization
- Example 3:
- Classification and Segmentation models use loss functions in
utils
- Detection have losses in model definition.
- Classification and Segmentation models use loss functions in
- Example 4:
- Segmentation training script has
DataParallel
which is not used for classification and detection.
- Segmentation training script has
The above example could be vague or has been changed. But generally, my point is this inconsistency unnecessarily requires users to learn something new for every new application, or realize A from classification and B from detection are essentially the same concept. Of course, models are so different and we may need special designs, but some inconsistency still could be handled in a better way.
CPU Int8 solution
- enable more models
- enhance the flexibility
- performance improvement for small batchsize
Enhanced/Additional Tutorials
Note: the following points refer to tutorials.
- How to customize NN output (i.e. custom classes)
- How to refine models (for all models since there are few but tricky differences between networks)
- How to train models from scratch
EDIT:
Easier (or better documented) custom segmentation
Biggest things in the DL-CV field happening right now surely is self-supervised learning. So more GANs: ProGan and StyleGan would be both very interesting but they require custom weighted convolution layers.