Xinchen Wang

[TSE'25] From Function to Repository: Towards Repository-Level Evaluation of Software Vulnerability Detection

Xin-Cheng Wen, Xinchen Wang, Yujia Chen, Ruida Hu, David Lo, Cuiyun Gao*.

IEEE Transactions on Software Engineering (CCF-A Journal)

We propose a holistic multi-level evaluation system, named VulEval, aiming at evaluating the detection performance of inter- and intra-procedural vulnerabilities simultaneously. Specifically, VulEval consists of three interconnected evaluation tasks: (1) Function-Level Vulnerability Detection, aiming at detecting intra-procedural vulnerability given a code snippet; (2) Vulnerability-Related Dependency Prediction, aiming at retrieving the vulnerable-related dependency from call graphs for providing developers with explanations about the vulnerabilities; and (3) Repository-Level Vulnerability Detection, aiming at detecting inter-procedural vulnerabilities by combining with the dependencies identified in the second task.

Paper Code

[NeurIPS'25] Repo2Run: Automated Building Executable Environment for Code Repository at Scale

Ruida Hu, Chao Peng*, Xinchen Wang, Junjielong Xu, Cuiyun Gao*.

Annual Conference on Neural Information Processing Systems (CCF-A Conference)

We introduce Repo2Run, the first LLM-based agent aiming at automating the building of executable test environments for any repositories at scale. Specifically, given a code repository, Repo2Run iteratively builds the Docker image, runs unit tests based on the feedback of the building, and synthesizes the Dockerfile until the entire pipeline is executed successfully.

🏆 Spotlight Paper

Paper Code

[ASE'25] An Agent-based Evaluation Framework for Complex Code Generation

Xinchen Wang, Ruida Hu, Pengfei Gao, Chao Peng, Cuiyun Gao*.

The IEEE/ACM International Conference on Automated Software Engineering (CCF-A Conference)

We propose CodeVisionary, the first agent-based evaluation framework for complex code generation. CodeVisionary consists of two stages: (1) Requirement-guided multi-dimensional context distillation stage, which first formulates a detailed evaluation plan by decomposing task requirements, and then stepwise collects multi-dimensional contextual information for each requirement. (2) Fine-grained scoring and summarization stage, which defines self-directed and negotiation-based actions, allowing multiple judges to comprehend complex code from fine-grained and diverse viewpoints, and reach a consensus through discussion. A comprehensive evaluation report is also generated for enhanced explainability.

🏆 Directly Accepted Without Revision (9.9%)

Paper Code

[FSE'25 Industry Track] AEGIS: An Agent-based Framework for Bug Reproduction from Issue Descriptions

Xinchen Wang, Pengfei Gao, Xiangxin Meng, Chao Peng, Ruida Hu, Yun Lin, Cuiyun Gao*.

The ACM International Conference on the Foundations of Software Engineering (CCF-A Conference)

We propose an automated bug reproduction script generation framework named AEGIS. AEGIS consists of two main modules: (1) Bug-related context summarization module, aiming at condensing the retrieved information into structural context through further reranking and summarization. (2) Finite state machine (FSM)-guided script generation module, which aims at guiding the script modification process with proposed FSM which contains predefined modification rules.

Paper Code

[SIGIR'25 Short Paper] Understanding Large Language Model Performance in Software Engineering: A Large-scale Question Answering Benchmark

Ruida Hu, Chao Peng, Jingyi Ren, Bo Jiang, Xiangxin Meng, Qinyun Wu, Pengfei Gao, Xinchen Wang, Cuiyun Gao*.

The International ACM SIGIR Conference on Research and Development in Information Retrieval (CCF-A Conference)

We introduce CodeRepoQA, a large-scale benchmark specifically designed for evaluating repository-level question-answering capabilities in the field of software engineering. CodeRepoQA is a multi-turn question-answering benchmark with 585,687 entries. It covers a diverse array of software engineering scenarios, with an average of 6.62 dialogue turns per entry.

Paper Code

[ICSE'24 Industry Challenge Track] ReposVul: A Repository-Level High-Quality Vulnerability Dataset

Xinchen Wang, Ruida Hu, Cuiyun Gao*, Xin-Cheng Wen, Yujia Chen, Qing Liao.

The IEEE/ACM International Conference on Software Engineering (CCF-A Conference)

We propose an automated data collection framework and construct the first repository-level high-quality vulnerability dataset named ReposVul. The proposed framework mainly contains three modules: (1) A vulnerability untangling module, aiming at distinguishing vulnerability-fixing related code changes from tangled patches. (2) A multi-granularity dependency extraction module, aiming at capturing the inter-procedural call relationships of vulnerabilities. (3) A trace-based filtering module, aiming at filtering the outdated patches.

🏆 Best Paper Award of the Track

Paper Code

[ASE'23] When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Xin-Cheng Wen, Xinchen Wang, Cuiyun Gao*, Shaohua Wang, Yang Liu, Zhaoquan Gu.

The IEEE/ACM International Conference on Automated Software Engineering (CCF-A Conference)

We propose a novel model named PILOT for vulnerability detection. It mainly contains two modules: (1) A distance-aware label selection module, aiming at generating pseudo-labels for selected unlabeled data, which involves the inter-class distance prototype and progressive fine-tuning; (2) A mixed-supervision representation learning module to further alleviate the influence of noise and enhance the discrimination of representations.

Paper Code

王欣辰

📝 Publications

[TSE'25] From Function to Repository: Towards Repository-Level Evaluation of Software Vulnerability Detection

[NeurIPS'25] Repo2Run: Automated Building Executable Environment for Code Repository at Scale

[ASE'25] An Agent-based Evaluation Framework for Complex Code Generation

[FSE'25 Industry Track] AEGIS: An Agent-based Framework for Bug Reproduction from Issue Descriptions

[SIGIR'25 Short Paper] Understanding Large Language Model Performance in Software Engineering: A Large-scale Question Answering Benchmark

[ICSE'24 Industry Challenge Track] ReposVul: A Repository-Level High-Quality Vulnerability Dataset

[ASE'23] When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

📜 Technical Reports

Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling

Marscode agent: AI-native automated bug fixing

🏆 Honors and Awards

💻 Internships

📖 Educations