网络安全与数据治理 4期
(中国人民解放军陆军工程大学 指挥控制工程学院,江苏 南京210007)
摘要: 传统的入侵检测系统无法应对日益增多和复杂的网络攻击(如高级持续性威胁),因为可能在几个月内不能检测出隐蔽威胁事件并具有较高误报率。最近研究建议利用溯源数据来实现基于主机的入侵检测,溯源图是由溯源数据构造成的有向无环图。然而,以前的研究是提取了整个溯源图的特征,对图中的少量异常攻击实体(节点)不敏感,因此无法准确识别异常节点。提出了一种在溯源图节点级别上的APT实时检测方法。采用K-Means和轮廓系数相结合的方法对训练数据集中的良性节点进行聚类,生成良性节点簇,通过判断新节点是否属于良性节点簇来判别是否存在异常。在Unicorn SC-2和DARPA TC两种公共数据集上评估该方法,结果表明该方法准确率达到95.83%,并且能够准确识别和定位异常节点。
Detecting advanced persistent threats through provenance graph in node level
Luo Hanxin,Wang Jinshuang,Wu Wenchang
(Command & Control Engineering College,Army Engineering University of PLA,Nanjing 210007,China)
Abstract: Traditional intrusion detection systems cannot cope with the increasing number and sophistication of cyberattacks such as advanced persistent threats(APT). Because they may not detect stealthy threat incidents for several months and have a high false-positive rate. Recent studies propose leveraging provenance data to detect threats in a host. Provenance graph is a directed acyclic graph constructed from provenance data. However, previous studies, which extracted features of the whole provenance graph, were not sensitive to the small number of threat-related entities(nodes), so it is still difficult to identify and locate the real attack entities. We propose a real-time detection method in node level. The benign nodes are clustered into clusters using K-Means and silhouette coefficient methods. An node is considered abnormal if it does not fit into any of the model′s clusters. Unicorn SC-2 and DARPA TC datasets are used to evaluate this method. The evaluation shows that this method achieves 95.83% accuracy and can accurately locate the positions of anomalous nodes.
Key words : advanced persistent threats(APT);intrusion detection;machine learning;provenance graph

0 引言

近年来,高级持续性威胁(Advanced Persistent Threats,APT)等复杂攻击对网络空间安全提出更大的挑战。攻击者不断改变攻击模式,寻找新的入侵点,并使用混淆方法保持不被发现。然而,当前入侵检测系统通常将系统调用和网络事件作为依据,只携带日志条目之间的顺序关系,难以直接提取有效的关联,因此对于APT的检测效果不佳。近几年的研究建议利用溯源图(Provanace Graph)丰富的上下文信息来实现入侵检测。溯源图是一个有向无环图,图中节点表示系统中主体(如进程、线程等)和对象(如文件、注册表、网络套接字),有向图中的边表示顶点之间的控制流和数据流的关系。与原始系统审计数据相比,溯源数据具有强大的语义表达能力和历史攻击关联能力。

目前攻击者更倾向于使用零日攻击,基于特征的方法缺乏检测未知威胁的能力。基于异常的图核(Graph Kernel)检测方法对整个溯源图进行检测,然而隐蔽入侵活动下生成的溯源图可能与良性行为活动下生成的溯源图相似,因此,难以检测出相似溯源图之间的异常,同时也无法识别和定位异常节点。





