Investigating LLM vulnerabilities and building defensive systems against adversarial attacks
My research centers on enhancing the security and reliability of large language models (LLMs), especially in adversarial settings. I study how subtle modifications in scam messages can exploit LLM vulnerabilities and lead to misclassifications. To address this, I develop adversarial benchmarks, structured perturbation techniques, and hybrid detection systems that combine fine-tuned LLMs with traditional machine learning models. These contributions aim to strengthen the robustness of LLM-based systems against real-world manipulative attacks.
During my 2025 internship with TSMC’s AI R&D group, I applied these ideas to high-volume semiconductor manufacturing by deploying Qwen3 models on NVIDIA H100 GPUs and automating anomaly detection workflows. The real-world feedback loop from that work continues to inform my research directions.
Key research areas include:
Scam detection remains a critical challenge in cybersecurity, especially with the increasing sophistication of adversarial scam messages designed to evade detection. This work proposes a Hierarchical Scam Detection System (HSDS) that integrates multi-model voting with a fine-tuned LLaMA 3.1 8B Instruct model to improve detection accuracy and robustness against adversarial attacks. Our approach leverages a four-model ensemble for initial scam classification, where each model independently evaluates scam messages, and a majority voting mechanism determines preliminary predictions. The final classification is refined using a fine-tuned LLaMA 3.1 8B Instruct model, optimized through adversarial training to mitigate misclassification risks. Experimental results demonstrate that our hierarchical framework significantly enhances scam detection performance, surpassing both traditional machine learning models and larger proprietary LLMs, such as GPT-3.5 Turbo, while maintaining computational efficiency. The findings highlight the effectiveness of a hybrid voting mechanism and adversarial fine-tuning in fortifying LLMs against evolving scam tactics, enhancing the resilience of automated scam detection systems.
Can we trust Large Language Models (LLMs) to accurately predict scam? This paper investigates the vulnerabil- ities of LLMs when facing adversarial scam messages for the task of scam detection. We addressed this issue by creating a comprehensive dataset with fine-grained labels of scam messages, including both original and adversarial scam messages. The dataset extended traditional binary classes for the scam detection task into more nuanced scam types. Our analysis showed how adversarial examples took advantage of vulnerabilities of a LLM, leading to high misclassification rate. We evaluated the performance of LLMs on these adversarial scam messages and proposed strategies to improve their robustness.
RailEstate is a web-based analytics platform that fuses 25 years of Washington metropolitan housing data with WMATA transit infrastructure to surface metro-linked price trends. The system supports interactive spatial queries, time-series visualizations, forecasting, and a natural language to SQL chatbot that transforms plain-English questions into optimized PostGIS queries. By unifying spatial databases, forecasting pipelines, and LLM-powered interfaces, RailEstate delivers actionable, transit-aware housing insights for planners, investors, and residents without requiring technical expertise.
My ongoing and future research agenda aims to address several critical challenges in AI security and LLM robustness:
I welcome collaboration opportunities in these research areas. Please contact me to discuss potential research partnerships.