Object Detection Papers

Avinash Kori

Avinash Kori

Ph.D Researcher on XAI and Causality

Tweets by koriavinash1

Paper Reviews on Object Detection Methods and Progression

RCNN
- Two-stage model
- Uses a selective search method to estimate high-quality region proposal on an image
- Estimated proposals are wrapped on the input image resulting in multiple object images based on proposals
- Features are extracted from wrapped object images, using some pre-trained CNN
- These features are both used for bounding box regression and object classification
- More of a brute force approach, and has multiply repeated and unnecessary computational steps
- Overall procedure: Proposals obtained by selective search on an image -> proposals are wrapped to original image size -> features are extracted using pre-trained CNN for each and every wrapped proposal -> these features are used in classification and bounding box regression
- Results: 66% mAP, with 0.02fps

Fast RCNN
- Two-stage model
- Uses a selective search method to estimate high-quality region proposal on an extracted CNN features
- Removes the repeated feature extraction step in RCNN by extracting feature only once.
- Extracted proposals are mapped to the obtained CNN features rather than input image as in RCNN
- These features are both used for bounding box regression and object classification
- Still involves some repeated computational steps
- Overall procedure: First Image features are extracted using pre-trained CNN -> Proposals obtained by selective search on an image are mapped on to obtained features -> proposed features are used in classification and bounding box regression
- Results: 70%mAP, with 0.4fps

Faster RCNN
- Two-stage model
- Faster RCNN uses Region Proposal network for proposal extraction instead of selective search method
- Region Proposal Network is CNN which provides a reduced number of proposals, while still ensuring high-quality proposals
- Region Proposal Network estimated proposals along with objectness score, and objectness score refers to the existence of the object in the proposal
- Higher the Objectness score the wrapped proposal image is used for bounding box regression and classification
- Overall procedure: First Image feature are extracted using pre-trained CNN -> features are used to estimate proposals and objectness score -> based on objectness score selected features are used for classification and bounding box regression
- Results: 73% mAP, with 7fps

YOLO
- Single-stage model
- Removes two separate steps of objectness score and object classification, by merging them
- Directly estimated classification score instead of objectness score. It also uses an additional class to represent no-object
- It just uses one pass to estimate classification score and bounding box, so considered as Single-stage model
- Overall procedure: First Image features are extracted using pre-trained CNN -> These features are directly used in classification and bounding box estimation
- Results: 66% mAP, with 21fps

SSD
- Single-stage model
- The main drawback of Yolo was, Yolo failed in detecting small objects
- SSD uses feature information for all the layers, which includes more information about detailed features along with more aggregated features
- This method of including multi-scale feature information boosts the performance of SSD as compared to YOLO
- The method uses multiple classifiers and detector blocks: each classifier and detector block estimates N (hyperparameter) objects in an image along with their bounding box
- This method also uses an additional class for no object
- Overall procedure: multi-scale features are extracted using pre-trained CNN -> each scale feature are used in the classifier and detector block -> each object detection and classification is performed at every feature scale -> Non-maximal suppression is applied for merging all bounding boxes and obtain final detection and classification
- Results: 74% mAP, with 46 fps

Cornernet
- Single-stage model
- Very interesting idea to eliminate the use of multiple anchor boxes
- Proposed the new layer known as corner pooling, to estimate if the pixel i,j is right corner or left corner of any object
- Reduces the complexity from O(w^2h^2) (in estimating anchor proposals) to O(wh)

References

Selective Search: https://link.springer.com/article/10.1007/s11263-013-0620-5
SSD slides: http://faculty.iitmandi.ac.in/~aditya/cs671/index.html

Share on

Twitter Facebook Google+ LinkedIn

Leave a Comment