Paper Reviews on Object Detection Methods and Progression
-
-
Two-stage model
-
Uses a selective search method to estimate high-quality region proposal on an image
-
Estimated proposals are wrapped on the input image resulting in multiple object images based on proposals
-
Features are extracted from wrapped object images, using some pre-trained CNN
-
These features are both used for bounding box regression and object classification
-
More of a brute force approach, and has multiply repeated and unnecessary computational steps
-
Overall procedure: Proposals obtained by selective search on an image -> proposals are wrapped to original image size -> features are extracted using pre-trained CNN for each and every wrapped proposal -> these features are used in classification and bounding box regression
-
Results: 66% mAP, with 0.02fps
-
-
-
Two-stage model
-
Uses a selective search method to estimate high-quality region proposal on an extracted CNN features
-
Removes the repeated feature extraction step in RCNN by extracting feature only once.
-
Extracted proposals are mapped to the obtained CNN features rather than input image as in RCNN
-
These features are both used for bounding box regression and object classification
-
Still involves some repeated computational steps
-
Overall procedure: First Image features are extracted using pre-trained CNN -> Proposals obtained by selective search on an image are mapped on to obtained features -> proposed features are used in classification and bounding box regression
-
Results: 70%mAP, with 0.4fps
-
-
-
Two-stage model
-
Faster RCNN uses Region Proposal network for proposal extraction instead of selective search method
-
Region Proposal Network is CNN which provides a reduced number of proposals, while still ensuring high-quality proposals
-
Region Proposal Network estimated proposals along with objectness score, and objectness score refers to the existence of the object in the proposal
-
Higher the Objectness score the wrapped proposal image is used for bounding box regression and classification
-
Overall procedure: First Image feature are extracted using pre-trained CNN -> features are used to estimate proposals and objectness score -> based on objectness score selected features are used for classification and bounding box regression
-
Results: 73% mAP, with 7fps
-
-
-
Single-stage model
-
Removes two separate steps of objectness score and object classification, by merging them
-
Directly estimated classification score instead of objectness score. It also uses an additional class to represent no-object
-
It just uses one pass to estimate classification score and bounding box, so considered as Single-stage model
-
Overall procedure: First Image features are extracted using pre-trained CNN -> These features are directly used in classification and bounding box estimation
-
Results: 66% mAP, with 21fps
-
-
-
Single-stage model
-
The main drawback of Yolo was, Yolo failed in detecting small objects
-
SSD uses feature information for all the layers, which includes more information about detailed features along with more aggregated features
-
This method of including multi-scale feature information boosts the performance of SSD as compared to YOLO
-
The method uses multiple classifiers and detector blocks: each classifier and detector block estimates N (hyperparameter) objects in an image along with their bounding box
-
This method also uses an additional class for no object
-
Overall procedure: multi-scale features are extracted using pre-trained CNN -> each scale feature are used in the classifier and detector block -> each object detection and classification is performed at every feature scale -> Non-maximal suppression is applied for merging all bounding boxes and obtain final detection and classification
-
Results: 74% mAP, with 46 fps
-
-
- Single-stage model
- Very interesting idea to eliminate the use of multiple anchor boxes
- Proposed the new layer known as corner pooling, to estimate if the pixel i,j is right corner or left corner of any object
- Reduces the complexity from O(w^2h^2) (in estimating anchor proposals) to O(wh)
References
- Selective Search: https://link.springer.com/article/10.1007/s11263-013-0620-5
- SSD slides: http://faculty.iitmandi.ac.in/~aditya/cs671/index.html
Leave a Comment