中圖分類號(hào): TN919?34 文獻(xiàn)標(biāo)識(shí)碼: A 文章編號(hào): 1004?373X(2015)06?0020?05
Dynamic texture classification method based on stacked denoising autoencoding model
WANG Cai?xia, WEI Xue?yun, WANG Biao
(School of Electronics and Information Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003 China)
Abstract: To overcome the shortcomings of extracting the feature descriptors by manual operation and too high feature dimension for dynamic scene classification, a deep learning network model is proposed to extract dynamic texture features. Firstly, the slow feature analysis method is used to learn dynamic characteristics of each video sequence through before hand, and the learned feature is used as input data of deep learning to get the advanced representation of the input signal. The stacked denoising autoencoding model is selected for the deep learning network mode. SVM classification method is used for its classification. The experimental result proves that the feature dimension extracted by this method is low and can effectively describe dynamic textures.
Keywords: dynamic texture classification; slow feature analysis; deep learning; stacked denoising autoencoding model
0 引 言
動(dòng)態(tài)紋理是指具有空間重復(fù)性、并隨時(shí)間變化的視覺模式,這種模式形成了一系列在時(shí)間域上具有某種不變性的圖像序列[1]。不同的動(dòng)態(tài)紋理可能具有相似的外觀,但是卻具有不同的運(yùn)動(dòng)形式,所以表觀和運(yùn)動(dòng)是動(dòng)態(tài)紋理特征的兩大方面。在目前的動(dòng)態(tài)視頻分析系統(tǒng)中,最關(guān)鍵的步驟是如何提取有效的動(dòng)態(tài)紋理特征描述符。在過去幾十年里,對(duì)紋理的研究大部分都集中在靜態(tài)紋理特征的研究,動(dòng)態(tài)紋理的研究相對(duì)靜態(tài)紋理而言起步要晚的多。動(dòng)態(tài)紋理的研究最早始于20世紀(jì)90年代初,由Nelson和Polana采用建立線性系統(tǒng)模型的方法對(duì)其進(jìn)行研究[2],并且將視覺運(yùn)動(dòng)劃分為三類[3]:行為、運(yùn)動(dòng)事件以及動(dòng)態(tài)紋理。隨后,Szummer 和 Picard提出采用時(shí)空自回歸模型(Spatio?Temporal Auto Regressive,STAR)[4]對(duì)動(dòng)態(tài)紋理序列進(jìn)行建模。基于光流的識(shí)別法是目前比較流行的動(dòng)態(tài)紋理識(shí)別法,因?yàn)槠溆?jì)算效率高,并且以一種很直觀的方式來描述圖像的局部動(dòng)態(tài)紋理特征,F(xiàn)azekas和Chetverikov總結(jié)出,正則性(Regulated)全局流與普通流(Normal Flow)相比,普通流可以同時(shí)包含動(dòng)態(tài)特性和形狀特性[5]。基于LBP的動(dòng)態(tài)紋理方法是最近幾年才提出的一種有效算法,典型的是Zhao等人提出的兩種時(shí)空域上的描述子:時(shí)空局部二值模式(Volume Local Binary Pattern,VLBP)[6]和三正交面局部二值模式(Local Binary Pattern from Three Orthogonal Planes,LBP?TOP)[7],有效地結(jié)合“運(yùn)動(dòng)”和“外觀”特征。2007―2008年是動(dòng)態(tài)紋理研究最多的兩年,各大期刊雜志連續(xù)刊登有關(guān)動(dòng)態(tài)紋理的研究文章。
降噪自動(dòng)編碼器(Denoising Auto Encoder,Dae)是在自動(dòng)編碼器的基礎(chǔ)上給訓(xùn)練數(shù)據(jù)加入噪聲,編碼器需要學(xué)習(xí)去除噪聲而獲得沒有被噪聲污染的輸入信號(hào),因此獲得輸入信號(hào)更加魯棒的表達(dá)。堆棧自動(dòng)編碼模型(Sda)是將多個(gè)Dae堆疊起來形成的一種深度網(wǎng)絡(luò)模型。利用優(yōu)化后的參數(shù)[θ]得到當(dāng)前層的輸出[y](即下一層的輸入),將得到的[y]作為新一層的輸入數(shù)據(jù),逐層進(jìn)行降噪自動(dòng)編碼的過程,直到到達(dá)多層神經(jīng)網(wǎng)絡(luò)中間隱層的最后一層為止,算出該層輸出,即為輸出特征,如圖3所示。
[1] DORETTO G, CHIUSO A, WU Y, et al. Dynamic textures [J]. International Journal on Computer Vision, 2003, 51(2): 91?109.
[2] NELSON R C, POLENA P. Qualitative recognition of motion using temporal texture [J]. CVGIP: Image Understanding, 1992, 56(1): 78?89.
[3] POLANA R, NELSON R. Temporal texture and activity recognition [J]. Motion?Based Recognition: Computational Imaging and Vision, 1997, 9: 87?124.
[4] SZUMMER M, PICARD R W. Temporal texture modeling [C]// Proceedings of 1996 International Conference on Image Processing. [S.l.]: [s.n.], 1996: 11?16.
[5] FAZEKAS S, CHETVERIKOV D. Normal versus complete ?ow in dynamic texture recognition a comparative study [C]// 2005 4th International Workshop on Texture Analysis and Synthesis (ICCV 2005). [S.l.]: [s.n.], 2005: 37?42.
[6] ZHAO G, PIETIK?INEN M. Dynamic texture recognition using volume local binary patterns [C]// European Conference on Computer Vision. [S.l.]: [s.n.], 2006: 165?177.
[7] PIETIK¨AINEN G Z M. Dynamic texture recognition using local binary patterns with an application to facial expression [J]. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007, 29(6): 915?928.
[8] THERIAULT Christian, THOME Nicolas, CORD Matthieu. Dynamic scene classification: learning motion descriptors with slow features analysis [EB/OL]. [2014?09?17]. http://.
[9] FRANZIUS M, WILBERT N, WISKOTT L. Invariant object recognition with slow feature analysis [C]// ICANN 18th International Conference. Berlin: Springer?Verlag, 2008: 961?970.
[10] WISKOTT L, SEJNOWSKI T. Slow feature analysis: Unsupervised learning of invariances [J]. Neural Comput., 2002, 14: 715?770.
[12] DE VALOIS R, YUND E, HEPLER N. The orientation and direction selectivity of cells in macaque visual cortex [J]. Vision Research, 1982, 22: 531?544.
[13] HUBEL D, WIESEL T. Receptive fields of single neurons in the cat’s striate cortex [J]. Journal of Physiol, 1959, 4: 574?591.
[14] DERPANIS Konstantinos, LECCE M, DANIILIDIS K, et al. Dynamic scene understanding: the role of orientation features in space and time in scene classification [C]// International Conference on Computer Vision and Pattern Recognition. [S.l.]: [s.n.], 2012: 111?121.
[15] MARSZALEK M, LAPTEV I, SCHMID C. Actions in Context [C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2009: 2?6.