Paper Reviews on Object Detection Methods and Progression
Two-stage model
Uses a selective search method to estimate high-quality region proposal on an image
Estimated proposals are wrapped on the input image resulting in multiple object images based on proposals
Features are extracted from wrapped object images, using some pre-trained CNN
These features are both used for bounding box regression and object classification
More of a brute force approach, and has multiply repeated and unnecessary computational steps
Overall procedure: Proposals obtained by selective search on an image -> proposals are wrapped to original image size -> features are extracted using pre-trained CNN for each and every wrapped proposal -> these features are used in classification and bounding box regression
Results: 66% mAP, with 0.02fps
Two-stage model
Uses a selective search method to estimate high-quality region proposal on an extracted CNN features
Removes the repeated feature extraction step in RCNN by extracting feature only once.
Extracted proposals are mapped to the obtained CNN features rather than input image as in RCNN
These features are both used for bounding box regression and object classification
Still involves some repeated computational steps
Overall procedure: First Image features are extracted using pre-trained CNN -> Proposals obtained by selective search on an image are mapped on to obtained features -> proposed features are used in classification and bounding box regression
Results: 70%mAP, with 0.4fps
Two-stage model
Faster RCNN uses Region Proposal network for proposal extraction instead of selective search method
Region Proposal Network is CNN which provides a reduced number of proposals, while still ensuring high-quality proposals
Region Proposal Network estimated proposals along with objectness score, and objectness score refers to the existence of the object in the proposal
Higher the Objectness score the wrapped proposal image is used for bounding box regression and classification
Overall procedure: First Image feature are extracted using pre-trained CNN -> features are used to estimate proposals and objectness score -> based on objectness score selected features are used for classification and bounding box regression
Results: 73% mAP, with 7fps
Single-stage model
Removes two separate steps of objectness score and object classification, by merging them
Directly estimated classification score instead of objectness score. It also uses an additional class to represent no-object
It just uses one pass to estimate classification score and bounding box, so considered as Single-stage model
Overall procedure: First Image features are extracted using pre-trained CNN -> These features are directly used in classification and bounding box estimation
Results: 66% mAP, with 21fps
Single-stage model
The main drawback of Yolo was, Yolo failed in detecting small objects
SSD uses feature information for all the layers, which includes more information about detailed features along with more aggregated features
This method of including multi-scale feature information boosts the performance of SSD as compared to YOLO
The method uses multiple classifiers and detector blocks: each classifier and detector block estimates N (hyperparameter) objects in an image along with their bounding box
This method also uses an additional class for no object
Overall procedure: multi-scale features are extracted using pre-trained CNN -> each scale feature are used in the classifier and detector block -> each object detection and classification is performed at every feature scale -> Non-maximal suppression is applied for merging all bounding boxes and obtain final detection and classification
Results: 74% mAP, with 46 fps
- Single-stage model
- Very interesting idea to eliminate the use of multiple anchor boxes
- Proposed the new layer known as corner pooling, to estimate if the pixel i,j is right corner or left corner of any object
- Reduces the complexity from O(w^2h^2) (in estimating anchor proposals) to O(wh)
- Selective Search:
- SSD slides:
Leave a Comment