Гистограмма направленных градиентов

Гистограмма направленных градиентов (англ. Histogram of Oriented Gradients, HOG) – дескрипторы особых точек, которые используются в компьютерном зрении и обработке изображений с целью распознавания объектов. Данная техника основана на подсчете количества направлений градиента в локальных областях изображения. Этот метод похож на гистограммы направления края, дескрипторы SIFT и контексты формы, но отличается тем, что вычисляется на плотной сетке равномерно распределенных ячеек и использует нормализацию перекрывающегося локального контраста для увеличения точности.

Навнит Далал и Билл Триггс, исследователи INRIA, впервые описали гистограмму направленных градиентов в своей работе на CVPR в июне 2005 года. В этой работе они использовали алгоритм для нахождения пешеходов на статичных изображениях, хотя впоследствии расширили область применения до нахождения людей на видео, а также различных животных и машин на статичных изображениях.

Теория

Основной идеей алгоритма является допущение, что внешний вид и форма объекта на участке изображения могут быть описаны распределением градиентов интенсивности или направлением краев. Реализация этих дескрипторов может быть произведена путем разделения изображения на маленькие связные области, именуемые ячейками, и расчетом для каждой ячейки гистограммы направлений градиентов или направлений краев для пикселов, находящихся внутри ячейки. Комбинация этих гистограмм и является дескриптором. Для увеличения точности локальные гистограммы подвергаются нормализации по контрасту. С этой целью вычисляется мера интенсивности на большем фрагменте изображения, который называется блоком, и полученное значение используется для нормализации. Нормализованные дескрипторы обладают лучшей инвариантностью по отношению к освещению.

Дескриптор HOG имеет несколько преимуществ над другими дескрипторами. Поскольку HOG работает локально, метод поддерживает инвариантность геометрических и фотометрических преобразований, за исключением ориентации объекта. Подобные изменения появятся только в боʼльших фрагментах изображения. Более того, как обнаружили Далал и Триггс, грубое разбиение пространства, точное вычисление направлений и сильная локальная фотометрическая нормализация позволяют игнорировать движения пешеходов, если они поддерживают вертикальное положение тела. Дескриптор HOG, таким образом, является хорошим средством нахождения людей на изображениях. ^[1]

Реализация алгоритма

Вычисление градиента

Первым шагом вычислений во многих детекторах особых точек является нормализация цвета и гамма-коррекция. Далал и Триггс установили, что для дескриптора HOG этот шаг можно опустить, поскольку последующая нормализация даст тот же результат. Поэтому на первом шаге рассчитываются значения градиентов. Самым распространенным методом является применение одномерной дифференцирующей маски в горизонтальном и/или вертикальном направлении. Этот метод требует фильтрации цветовой или яркостной составляющей при помощи следующих фильтрующих ядер:

$[-1, 0, 1]$ и $[-1, 0, 1]^T.\,$

Далал и Триггс использовали более сложные маски, такие как Собел 3x3 (Оператор Собеля) или диагональные маски, но эти маски показали более низкую производительность для данной задачи. Они также экспериментировали с размытием по Гауссу перед применением дифференцирующей маски, но также обнаружили, что пропуск этого шага увеличивает быстродействие без заметной потери качества. ^[2]

Группировка направлений

На следующем шаге вычисляются гистограммы ячеек. Каждый пиксел в ячейке участвует во взвешенном голосовании для каналов гистограммы направлений, основанном на значении градиентов. Ячейки могут быть прямоугольной или круглой формы, каналы гистограммы равномерно распределяются от 0 до 180 или же от 0 до 360 градусов, в зависимости от того, вычисляется "знаковый" или "беззнаковый градиент". Далал и Триггс обнаружили, что беззнаковый градиент совместно с девятью каналами гистограммы дает лучшие результаты при распознавании людей. При распределении весов в голосовании вес пикселя может задаваться либо абсолютным значением градиента, либо некоторой функцией от него; в реальных тестах абсолютное значение градиента дает лучшие результаты. Другими возможными вариантами могут быть квадратный корень, квадрат или урезанное абсолютное значение градиента. ^[3]

Блоки дескрипторов

Для принятия во внимания яркости и контрастности градиенты следует локально нормировать, для чего ячейки нужно сгрупировать в более крупные связные блоки. Дескриптор HOG, таким образом, является вектором компонент нормированных гистограмм ячеек из всех областей блока. Как правило, блоки перекрываются, т. е. каждая ячейка входит более чем в один конечный дескриптор. Используются две основные геометрии блока: прямоугольные R-HOG и круглые C-HOG. Блоки R-HOG обычно являются квадратными сетками, характеризующимися тремя параметрами: количеством ячеек на блок, количеством пикселов на ячейку и количеством каналов на гистограмму ячейки.В эксперименте Далала и Триггса оптимальными параметрами являются блоки 3x3, ячейки 6x6 и 9 каналов на гистограмму. Moreover, they found that some minor improvement in performance could be gained by applying a Gaussian spatial window within each block before tabulating histogram votes in order to weight pixels around the edge of the blocks less. The R-HOG blocks appear quite similar to the scale-invariant feature transform descriptors; however, despite their similar formation, R-HOG blocks are computed in dense grids at some single scale without orientation alignment, whereas SIFT descriptors are computed at sparse, scale-invariant key image points and are rotated to align orientation. In addition, the R-HOG blocks are used in conjunction to encode spatial form information, while SIFT descriptors are used singly.

C-HOG blocks can be found in two variants: those with a single, central cell and those with an angularly divided central cell. In addition, these C-HOG blocks can be described with four parameters: the number of angular and radial bins, the radius of the center bin, and the expansion factor for the radius of additional radial bins. Dalal and Triggs found that the two main variants provided equal performance, and that two radial bins with four angular bins, a center radius of 4 pixels, and an expansion factor of 2 provided the best performance in their experimentation. Also, Gaussian weighting provided no benefit when used in conjunction with the C-HOG blocks. C-HOG blocks appear similar to Shape Contexts, but differ strongly in that C-HOG blocks contain cells with several orientation channels, while Shape Contexts only make use of a single edge presence count in their formulation. ^[4]

Нормализация блоков

Далал и Триггс исследовали четыре метода нормализации блоков. Пусть $v$ – ненормированный вектор, содержащий все гистограммы данного блока, $\|v\|_k$ – его k-норма при $k={1,2}$ и $e$ – некая малая константа (точное значение не так важно). Тогда нормировочный множитель можно получить одним из следующих способов:

L2-норма: $f = {v \over \sqrt{\|v\|^2_2+e^2}}$

L2-hys: L2-норма ограничивается сверху (значения v, боʼльшие 0,2, полагаются равными 0,2) и перенормируется, как в ^[5]

L1-норма: $f = {v \over (\|v\|_1+e)}$

корень из L1-нормы: $f = \sqrt{v \over (\|v\|_1+e)}$

Далал и Триггс установили, что L1-норма дает менее надежные результаты, чем остальные три, которые работают приблизительно одинаково хорошо, однако все четыре метода значительно улучшают результаты по сравнению с ненормализованными. ^[6]

SVM-классификатор

Основная статья: Метод опорных векторов

Конечным шагом в распознавании объектов с использованием HOG является классификация дескрипторов при помощи системы обучения с учителем. Далал и Триггс использовали метод опорных векторов (SVM, Support Vector Machine).

Testing

In their original human detection experiment, Dalal and Triggs compared their R-HOG and C-HOG descriptor blocks against generalized Haar wavelets, PCA-SIFT descriptors, and Shape Contexts. Generalized Haar wavelets are oriented Haar wavelets, and were used in 2001 by Mohan, Papageorgiou, and Poggio in their own object detection experiments. PCA-SIFT descriptors are similar to SIFT descriptors, but differ in that principal component analysis is applied to the normalized gradient patches. PCA-SIFT descriptors were first used in 2004 by Ke and Sukthankar and were claimed to outperform regular SIFT descriptors. Finally, Shape Contexts use circular bins, similar to those used in C-HOG blocks, but only tabulate votes on the basis of edge presence, making no distinction with regards to orientation. Shape Contexts were originally used in 2001 by Belongie, Malik, and Puzicha.

The testing commenced on two different data sets. The Massachusetts Institute of Technology pedestrian database contains 509 training images and 200 test images of pedestrians on city streets. The set only contains images featuring the front or back of human figures and contains little variety in human pose. The set is well-known and has been used in a variety of human detection experiments, such as those conducted by Papageorgiou and Poggio in 2000. The MIT database is currently available for research at http://cbcl.mit.edu/cbcl/software-datasets/PedestrianData.html. The second set was developed by Dalal and Triggs exclusively for their human detection experiment due to the fact that the HOG descriptors performed near-perfectly on the MIT set. Their set, known as INRIA, contains 1805 images of humans taken from personal photographs. The set contains images of humans in a wide variety of poses and includes difficult backgrounds, such as crowd scenes, thus rendering it more complex than the MIT set. The INRIA database is currently available for research at http://lear.inrialpes.fr/data.

The above site has an image showing examples from the INRIA human detection database.

As for the results, the C-HOG and R-HOG block descriptors perform comparatively, with the C-HOG descriptors maintaining a slight advantage in the detection miss rate at fixed false positive rates across both data sets. On the MIT set, the C-HOG and R-HOG descriptors produced a detection miss rate of essentially zero at a 10⁻⁴ false positive rate. On the INRIA set, the C-HOG and R-HOG descriptors produced a detection miss rate of roughly 0.1 at a 10⁻⁴ false positive rate. The Generalized Haar Wavelets represent the next highest performing approach: the wavelets produced roughly a 0.01 miss rate at a 10⁻⁴ false positive rate on the MIT set, and roughly a 0.3 miss rate on the INRIA set. The PCA-SIFT descriptors and Shape Contexts both performed fairly poorly on both data sets. Both methods produced a miss rate of 0.1 at a 10⁻⁴ false positive rate on the MIT set and nearly a miss rate of 0.5 at a 10⁻⁴ false positive rate on the INRIA set. The image below contains the result data from the original Dalal and Triggs experiment. The curves represent the Detection Error Tradeoff on a log-log scale, which equates to the miss rate versus the false positive rate. ^[7]

Дальнейшее развитие

As part of the Pascal Visual Object Classes 2006 Workshop, Dalal and Triggs presented results on applying Histogram of Oriented Gradient descriptors to image objects other than human beings, such as cars, buses, and bicycles, as well as common animals such as dogs, cats, and cows. They included with their results the optimal parameters for block formulation and normalization in each case. The image in the below reference shows some of their detection examples for motorbikes. ^[8]

Then as part of the 2006 European Conference on Computer Vision, Dalal and Triggs teamed up with Cordelia Schmid to apply Histogram of Oriented Gradient detectors to the problem of human detection in films and videos. Essentially their technique involves the combination of regular HOG descriptors on individual video frames with new Internal Motion Histograms (IMH) on pairs of subsequent video frames. These Internal Motion Histograms use the gradient magnitudes from optical flow fields obtained from two consecutive frames. These gradient magnitudes are then used in the same manner as those produced from static image data within the HOG descriptor approach. When testing on two large datasets taken from several movie DVDs, the combined HOG-IMH method yielded a miss rate of approximately 0.1 at a $10^{-4}$ false positive rate. ^[9]

At the Intelligent Vehicles Symposium in 2006, F. Suard, A. Rakotomamonjy, and A. Bensrhair introduced a complete system for pedestrian detection based on HOG descriptors. Their system operates using two infrared cameras. Since human beings appear brighter than their surroundings on infrared images, the system first locates positions of interest within the larger view field where humans could possibly be located. Then normal Support Vector Machine classifiers operate on the HOG descriptors taken from these smaller positions of interest to formulate a decision regarding the presence of a pedestrian. Once pedestrians are located within the view field, the actual position of the pedestrian is estimated using stereovision. ^[10]

At the IEEE Conference on Computer Vision and Pattern Recognition in 2006, Qiang Zhu, Shai Avidan, Mei-Chen Yeh, and Kwang-Ting Cheng presented an algorithm to significantly speed up human detection using HOG descriptor methods. Their method uses HOG descriptors in combination with the cascade of rejecters algorithm normally applied with great success to the problem of face detection. Also, rather than relying on blocks of uniform size, they introduce blocks that vary in size, location, and aspect ratio. In order to isolate the blocks best suited for human detection, they applied the AdaBoost algorithm to select those blocks to be included in the rejecter cascade. In their experimentation, their algorithm achieved comparable performance to the original Dalal and Triggs algorithm, but operated at speeds up to 70 times faster. In April 2006, the Mitsubishi Electric Research Laboratories applied for the U.S. Patent of this algorithm under application number 20070237387. ^[11]

См. также

Нахождение углов
Нахождение пешеходов
Особые точки (компьютерное зрение)
Нахождение особых точек (компьютерное зрение)
Извлечение особых точек
Распознавание объектов
SIFT

References

Внешние ссылки

http://www.mathworks.com/matlabcentral/fileexchange/33863 Реализация для Matlab (mex-файл)
http://www.cs.cmu.edu/~yke/pcasift/ - Код для нахождения объектов методом PCA-SIFT
http://lear.inrialpes.fr/software/ - Набор программного обеспечения для нахождения объектов при помощи HOG (домашняя страница исследовательской группы)
http://www.navneetdalal.com/software/ - Набор программного обеспечения для нахождения объектов при помощи HOG (домашняя страница Навнита Далала)
http://pascal.inrialpes.fr/data/human/ - Набор изображений INRIA с людьми
http://cbcl.mit.edu/software-datasets/PedestrianData.html - Набор изображений пешеходов MIT

Категории:

Компьютерное зрение
Распознавание образов

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Словари и энциклопедии на Академике

Гистограмма направленных градиентов

Содержание

Теория