基于改進(jìn)Canopy-Kmeans算法的并行化研究

首頁(yè) > 過(guò)刊瀏覽>2021年第29卷第2期 >176-179

基于改進(jìn)Canopy-Kmeans算法的并行化研究
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        
                        
                    
作者單位:西安理工大學(xué)自動(dòng)化與信息工程學(xué)院
作者簡(jiǎn)介:
通訊作者:
中圖分類(lèi)號:TP301.6
基金項目:陜西省科技計劃重點(diǎn)項目(2017ZDCXL-GY-05-03)

Research on Parallelization Based on Improved Canopy-Kmeans AlgorithmWang Lin Jia Junche

Author:

Affiliation:

Fund Project:

摘要

圖/表

訪(fǎng)問(wèn)統計

參考文獻

相似文獻

引證文獻

資源附件

文章評論

摘要:

隨著(zhù)互聯(lián)網(wǎng)數據的快速增長(cháng),原始的Kmeans算法已經(jīng)不足以應對大規模數據的聚類(lèi)需求。為此,提出一種改進(jìn)的Canopy-Kmeans聚類(lèi)算法。首先面對Canopy算法中心點(diǎn)隨機選取的不足,引入“最大最小原則”優(yōu)化Canopy中心點(diǎn)的選取；接著(zhù)借助三角不等式定理對Kmeans算法進(jìn)行優(yōu)化,減少冗余的距離計算,加快算法的收斂速度；最后結合MapReduce框架并行化實(shí)現改進(jìn)的Canopy-Kmeans算法。基于構建的微博數據集,對優(yōu)化后的Canopy-Kmeans算法進(jìn)行測試。試驗結果表明：對不同數據規模的微博數據集,優(yōu)化后算法的準確率較Kmeans算法提高了約15%,較原始的Canopy-Kmeans算法提高了約7%,算法的執行效率和擴展性也有較大提升。

Abstract:

(School of Automation and Information Engineering, Xi"an University of Technology, Xi"an, 710048)：With the rapid growth of Internet data, the original Kmeans algorithm is no longer sufficient to meet the clustering needs of large-scale data. To this end, an improved Canopy-Kmeans clustering algorithm is proposed. Faced with the shortcomings of the random selection of the center point of the Canopy algorithm, the "maximum principle" was introduced to optimize the selection of the Canopy center point; then the Kmeans algorithm was optimized with the help of the triangle inequality theorem to reduce redundant distance calculations and accelerate the convergence rate of the algorithm; finally Combined with MapReduce framework parallelization to achieve improved Canopy-Kmeans algorithm. Based on the constructed Weibo dataset, the optimized Canopy-Kmeans algorithm is tested. The test results show that the accuracy of the optimized algorithm is about 15% higher than that of the Kmeans algorithm and about 7% higher than that of the original Canopy-Kmeans algorithm. The execution efficiency and scalability of the algorithm are also improved. Greatly improved.

參考文獻

相似文獻

引證文獻

引用本文

王林,賈鈞琛.基于改進(jìn)Canopy-Kmeans算法的并行化研究計算機測量與控制[J].,2021,29(2):176-179.

復制

文章指標

點(diǎn)擊次數:
下載次數:
HTML閱讀次數:
引用次數:

歷史

收稿日期:2020-06-22
最后修改日期:2020-07-07
錄用日期:2020-07-07
在線(xiàn)發(fā)布日期: 2021-02-08
出版日期:

国产欧美精品一区二区,中文字幕专区在线亚洲,国产精品美女网站在线观看,艾秋果冻传媒2021精品,在线免费一区二区,久久久久久青草大香综合精品,日韩美aaa特级毛片,欧美成人精品午夜免费影视

引用本文

分享

文章指標

歷史

文章二維碼