Mitigating Data Deficiencies and Poisoning in Network-Centric Cybersecurity Systems

Wang, Haofan

Metadata Field	Value	Language
dc.contributor.advisor	Kandah, Farah
dc.contributor.author	Wang, Haofan
dc.date.accessioned	2026-05-01T19:34:02Z
dc.date.available	2026-05-01T19:34:02Z
dc.date.issued	2026-05-01
dc.identifier.uri	https://etd.auburn.edu/handle/10415/10405
dc.description.abstract	Cybersecurity network detection systems heavily rely on datasets to train models that identify and classify malicious activities. However, real-world network data often suffer from deficiencies such as class imbalance, low feature discriminability, redundancy, and a lack of structural information. These limitations reduce the accuracy and stability of models, leaving systems vulnerable to both misclassification and manipulation. To address the aforementioned issues, we conducted a series of studies focused on enhancing data quality and robustness. We employed Multi-Critic GANs and U-Net–based diffusion models to generate realistic synthetic traffic, alleviating class imbalance and data sparsity while preserving the original distribution. Feature extraction and selection methods were used to derive discriminative, non-redundant attributes that improved interpretability and efficiency. Despite these advances, dataset integrity remains threatened by deliberate poisoning attacks that inject malicious samples into training pipelines. With the rapid development of data-driven models, including large language models, the threat of poisoning attacks has become increasingly severe, as even a small proportion of malicious data can significantly alter a model’s behavior. To address this challenge, we propose the Counterfactual Incremental Defense against Poisoning Attacks (CIDPA), which combines counterfactual metrics with incremental updating to achieve continuous and robust defense. Specifically, we compute the minimal counterfactual cost to estimate, for each new sample, the smallest feasible change required to flip the model’s current prediction. The model is then updated progressively through a sliding window mechanism. Since the minimal counterfactual cost remains stable on clean data, CIDPA monitors window-level distributional statistics to detect potential contamination. When the window exhibits normal or mild poisoning, the system operates in the within-window mode, where counterfactual metrics are standardized and analyzed locally to filter subtle anomalies. However, when the degree of poisoning becomes severe, and the window’s distribution is no longer reliable, CIDPA automatically switches to the cross-window mode, using the statistical patterns of historical clean windows as a reference baseline. This switching mechanism prevents the model from self-contamination and ensures stability even under large-scale poisoning. In the experiments, we intentionally manipulated the datasets to simulate three different poisoning attack scenarios. Under varying proportions of poisoned data, the results showed that our proposed CIDPA framework consistently achieved strong defense performance and outperformed two existing poisoning defense methods across all conditions.	en_US
dc.rights	EMBARGO_GLOBAL	en_US
dc.subject	Computer Science and Software Engineering	en_US
dc.title	Mitigating Data Deficiencies and Poisoning in Network-Centric Cybersecurity Systems	en_US
dc.type	PhD Dissertation	en_US
dc.embargo.length	MONTHS_WITHHELD:36	en_US
dc.embargo.status	EMBARGOED	en_US
dc.embargo.enddate	2029-05-01	en_US
dc.contributor.committee	Aakur, Sathyanarayanan
dc.contributor.committee	Mulder, Samuel
dc.contributor.committee	Dozier, Gerry
dc.contributor.committee	Tripp, Lucretia

Files in this item

Name:: Mitigating Data Deficiencies and Poisoning in Network-Centric Cybersecurity Systems.pdf
Size:: 4.404Mb

Show simple item record