MOGvision

Feature detection is the process of extracting salient feature points from an image. The feature points could be blobs, corners or even edges. Feature detection finds numerous applications in image processing and computer vision such as visual localization and 3D reconstruction. A good feature detector must provide reliable interest points/keypoints that are scale-invariant, highly distinguishable, robust to noise and distortions, valid with high repeatability rate, well localized, of easy implementation and computationally fast. For instance, the intensity variation should not change the location of keypoints (Figure 1).

**Figure 1**. The location of FFD's keypoints^[1] in (left) distortion-free and (right) distorted images.

In this work, it is shown that robust and accurate keypoints exist in the specific scale-space domain. To this end, we first formulate the superimposition problem into a mathematical model and then derive a closed-form solution for multiscale analysis. The model is formulated via difference-of-Gaussian (DoG) kernels in the continuous scale-space domain, and it is proved that setting the scale-space pyramid’s blurring ratio and smoothness to 2 and 0.627, respectively, facilitates the detection of reliable keypoints. For the applicability of the proposed model to discrete images, we discretize it using the undecimated wavelet transform and the cubic spline function. Theoretically, the complexity of our method is less than 5% of that of the popular baseline Scale Invariant Feature Transform (SIFT). Extensive experimental results show the superiority of the proposed feature detector over the existing representative hand-crafted and learning-based techniques in accuracy and computational time.

**Table 1**. The computational time in milliseconds of different feature detectors over an image with 800×640 pixels (The average time over 20 runs is reported here)
Feature Detector	Category	Platform	Run Time (ms)
SIFT	Multiscale	CPU	552
SURF	Multiscale	CPU	159
BRISK	Multiscale	CPU	147
HarrisZ	Multiscale	CPU	2700
KAZE	Multiscale	CPU	1500
AKAZE	Multiscale	CPU	438
DNet	Deep learning	GPU	1300
TILDE	Deep learning	CPU	12100
TCDET	Deep learning	GPU	4100
SuperPoint	Deep learning	GPU	54
D2Net	Deep learning	GPU	950
FFD (ours)	Multiscale	CPU	29

Experimental results and a comparative study with state-of-the-art techniques over several publicly accessible datasets and example applications showed that FFD can detect more highly reliable feature points in the shortest time, which makes it more suitable for real-time applications. A summary of the results is attached below.

3D Reconstruction from Multiview images

**Figure 2**.The reprojection error (left) and the mean track length (right) of different feature detectors for 3D reconstruction over different datasets. The best result in each dataset is highlighted with an asterisk. Image courtesy of ^[1].

Visual Localization

**Table 2**. Visual localization results on the **Aachen** dataset.
Feature Detector	(0.5m, ∠ 2)	(1m, ∠ 5)	(5m, ∠ 10)
SIFT	42.9	56.1	80.6
SURF	38.8	55.1	73.5
BRISK	39.8	59.2	77.6
HarrisZ	41.8	57.1	75.5
KAZE	40.6	53.0	74.4
LIFT	35.6	53.1	67.3
DNet	37.2	54.1	68.4
TILDE	38.8	54.1	69.4
TCDET	39.8	55.1	72.5
SuperPoint	40.8	59.2	78.6
D2Net	40.8	56.1	75.5
FFD	44.9	60.2	81.6

Robustness

**Figure 3**.mAP results of different feature detectors over the images in the homography databases corrupted by the additive WGN with a standard deviation ranging from 0.01 to 0.2. Image courtesy of ^[1].

Reference

[1] FFD: Fast Feature Detector, IEEE Transactions on Image Processing 2021, paper, arXiv, code, ↩

3D Reconstruction from Multiview images

Visual Localization

Robustness

Reference