This Is Auburn

Show simple item record

Mitigating Data Deficiencies and Poisoning in Network-Centric Cybersecurity Systems


Metadata FieldValueLanguage
dc.contributor.advisorKandah, Farah
dc.contributor.authorWang, Haofan
dc.date.accessioned2026-05-01T19:34:02Z
dc.date.available2026-05-01T19:34:02Z
dc.date.issued2026-05-01
dc.identifier.urihttps://etd.auburn.edu/handle/10415/10405
dc.description.abstractCybersecurity network detection systems heavily rely on datasets to train models that identify and classify malicious activities. However, real-world network data often suffer from deficiencies such as class imbalance, low feature discriminability, redundancy, and a lack of structural information. These limitations reduce the accuracy and stability of models, leaving systems vulnerable to both misclassification and manipulation. To address the aforementioned issues, we conducted a series of studies focused on enhancing data quality and robustness. We employed Multi-Critic GANs and U-Net–based diffusion models to generate realistic synthetic traffic, alleviating class imbalance and data sparsity while preserving the original distribution. Feature extraction and selection methods were used to derive discriminative, non-redundant attributes that improved interpretability and efficiency. Despite these advances, dataset integrity remains threatened by deliberate poisoning attacks that inject malicious samples into training pipelines. With the rapid development of data-driven models, including large language models, the threat of poisoning attacks has become increasingly severe, as even a small proportion of malicious data can significantly alter a model’s behavior. To address this challenge, we propose the Counterfactual Incremental Defense against Poisoning Attacks (CIDPA), which combines counterfactual metrics with incremental updating to achieve continuous and robust defense. Specifically, we compute the minimal counterfactual cost to estimate, for each new sample, the smallest feasible change required to flip the model’s current prediction. The model is then updated progressively through a sliding window mechanism. Since the minimal counterfactual cost remains stable on clean data, CIDPA monitors window-level distributional statistics to detect potential contamination. When the window exhibits normal or mild poisoning, the system operates in the within-window mode, where counterfactual metrics are standardized and analyzed locally to filter subtle anomalies. However, when the degree of poisoning becomes severe, and the window’s distribution is no longer reliable, CIDPA automatically switches to the cross-window mode, using the statistical patterns of historical clean windows as a reference baseline. This switching mechanism prevents the model from self-contamination and ensures stability even under large-scale poisoning. In the experiments, we intentionally manipulated the datasets to simulate three different poisoning attack scenarios. Under varying proportions of poisoned data, the results showed that our proposed CIDPA framework consistently achieved strong defense performance and outperformed two existing poisoning defense methods across all conditions.en_US
dc.rightsEMBARGO_GLOBALen_US
dc.subjectComputer Science and Software Engineeringen_US
dc.titleMitigating Data Deficiencies and Poisoning in Network-Centric Cybersecurity Systemsen_US
dc.typePhD Dissertationen_US
dc.embargo.lengthMONTHS_WITHHELD:36en_US
dc.embargo.statusEMBARGOEDen_US
dc.embargo.enddate2029-05-01en_US
dc.contributor.committeeAakur, Sathyanarayanan
dc.contributor.committeeMulder, Samuel
dc.contributor.committeeDozier, Gerry
dc.contributor.committeeTripp, Lucretia

Files in this item

Show simple item record