Sen Fang

Research Interests

My research lies at the intersection of large language models and software engineering, with a focus on three directions: (1) LLM-based agents for vulnerability detection — designing multi-agent systems that combine static analysis (CodeQL, Joern, CPGs) with LLM reasoning to automatically find and verify security flaws; (2) LLM robustness and reliability — evaluating and improving the consistency of code LLMs under quantization and diverse inputs; and (3) automated program repair and code optimization — leveraging fine-tuned LLMs (LoRA/QLoRA) and retrieval-augmented pipelines to fix bugs and improve code performance.

Research Highlights

AEGIS

ACM SIGPLAN OOPSLA 2026 Under Review

A multi-agent framework that decouples vulnerability detection into clue discovery and vulnerability reasoning. Uses Joern CPG queries for repo-level context augmentation on suspicious code lines, precisely controlling context length and accuracy for scalable detection.

EVALOOOP

ACM TOSEM Major Revision

A self-consistency-centered framework for evaluating LLM robustness in code generation. Benchmarks 20+ LLMs across quantization levels and prompt perturbations, revealing systematic reliability gaps in code LLMs with practical implications for model deployment.

Education

NC State University Aug 2024 – Present

Ph.D. in Computer Science · Advisor: Prof. Bowen Xu (SoftMax Lab)
Focus: LLM-driven vulnerability detection, AI agents for software security, LLM robustness

Central China Normal University Sep 2018 – Jun 2020

M.Sc. in Electronics & Communication Engineering · Advisor: Prof. Shaocheng Qu

Wuhan Polytechnic University Sep 2014 – Jun 2018

B.Sc. in Electronic Information Engineering Outstanding Student 2018

Research & Industry Experience

Hedra May 2025 – Aug 2025

Machine Learning Engineer Intern · Supervised by Hongwei Yi

Built petabyte-scale video data processing pipeline; curated millions of high-quality samples for post-training generative models (text-to-video, image-to-video). Collaborated with research and infrastructure teams on data quality metrics and filtering strategies.

KTH Royal Institute of Technology Mar 2023 – Mar 2024

Research Engineer · Advisor: Prof. Martin Monperrus

Co-developed RepairLLaMA (LoRA fine-tuning for program repair) and Supersonic (LLM-driven C/C++ optimization); both published in IEEE TSE. Designed code representation strategies and training pipelines. Contributed to generative-AI-based test data generation.

Macau University of Science and Technology Sep 2020 – Nov 2022

Research Assistant · Advisor: Prof. Tao Zhang

Conducted research on deep-learning-based bug report understanding (ICSE '23), code clone detection via intermediate-code graphs (IEEE TR), and pull request description generation (JSS). Published 5 first-/co-first-author papers.

Selected Publications

Vulnerability Detection & Software Security

[1] S. Fang, W. Ding, B. Xu. "AEGIS: Multi-Agent CPG-Augmented Framework for Automated Vulnerability Detection." Submitted to ACM SIGPLAN OOPSLA, 2026. Under Review

[2] Y. Huang, S. Fang, J. Li, J. Tao, B. Hu, T. Zhang. "Deep Smart Contract Intent Detection." IEEE SANER 2025. A

[3] Y. Li, S. Fang, et al. "Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Tasks." ACM TOSEM, 2025. A*

LLM Robustness

[4] S. Fang, W. Ding, A. Mastropaolo, B. Xu. "Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation." IEEE TSE. Major Revision A*

[5] S. Fang, W. Ding, B. Xu. "EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming." ACM TOSEM. Major Revision A*

[6] Y. Li, T. Zhang, X. Luo, H. Cai, S. Fang, D. Yuan. "Do Pre-trained Language Models Indeed Understand Software Engineering Tasks?" IEEE TSE. A*

Automated Program Repair & Code Optimization

[7] S. Fang*, A. Silva*, M. Monperrus. "RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair." IEEE TSE, 2025. A* (*equal contribution)

[8] Z. Chen, S. Fang, M. Monperrus. "Supersonic: Learning to Generate Source Code Optimizations in C/C++." IEEE TSE, 2024. A*

[9] B. Baudry, K. Etemadi, S. Fang, et al. "Generative AI to Generate Test Data Generators." IEEE Software.

Mining Software Engineering & Code Intelligence

[10] S. Fang, T. Zhang, Y. Tan, H. Jiang, X. Xia, X. Sun. "RepresentThemAll: A Universal Learning Representation of Bug Reports." ICSE 2023. A*

[11] D. Yuan*, S. Fang*, T. Zhang, Z. Xu, X. Luo. "Java Code Clone Detection by Exploiting Semantic and Syntax Information from Intermediate Code-Based Graph." IEEE TR, 2022. A (*equal contribution)

[12] S. Fang, T. Zhang, Y. Tan, Z. Xu, Z. Yuan, L. Meng. "PRHAN: Automated Pull Request Description Generation Based on Hybrid Attention Network." JSS, 2022. A

[13] S. Fang*, Y. Tan*, T. Zhang, Z. Xu, H. Liu. "Effective Prediction of Bug-Fixing Priority via Weighted Graph Convolutional Networks." IEEE TR, 2021. A (*equal contribution)

[14] S. Fang, Y. Tan, T. Zhang, Y. Liu. "Self-Attention Networks for Code Search." IST, 2021. A

[15] Y. Tan, J. Chen, W. Shang, T. Zhang, S. Fang, X. Luo, Z. Chen, S. Qi. "STRE: An Automated Approach to Suggesting App Developers When to Stop Reading Reviews." IEEE TSE. A*

Professional Service

Reviewer: IEEE TSC, ACM TOSEM, IST, EAAI, JSS, ASE, AIR, JCC, and others

Honors & Awards

Qualcomm Innovation Fellowship 2026 — Finalist
North America, Final Round

Technical Expertise

Vulnerability & Program Analysis

Joern, CodeQL, SpotBugs, FindSecBugs, Code Property Graphs, AST / CFG / DFG analysis

LLM Training & Inference

PyTorch, Transformers, LoRA / QLoRA / PEFT, vLLM, DeepSpeed, SLURM, JAX

Agent & Retrieval Systems

Multi-agent orchestration, RAG pipelines, tool-augmented LLM agents

Languages & Platforms

Python, C/C++, Java, LaTeX · Linux, Jupyter, Cursor, Claude Code