Jingyu (Jack) Zhang

I am a PhD student in Computer Science at Johns Hopkins University, proudly advised by Daniel Khashabi and Benjamin Van Durme. My research at JHU is supported by the Amazon AI PhD Fellowship. I am also a student researcher at Meta Superintelligence Labs collaborating with Hongyuan Zhan and Jason Weston.

My research centers on the post-training of foundation models and agents, with an emphasis on their alignment and robustness. I aim to develop adaptive, controllable methods that simultaneously advance model capability alongside safety and adversarial robustness. For instance, my recent work leverages reinforcement learning to enable multi-agent collaboration and adaptation to diverse values, and develops renewable evaluation benchmarks that scale with rapidly evolving model capabilities. My long-term goal is to build safe, collaborative agentic systems that reliably accomplish long-horizon, economically valuable tasks.

Previously, I was a research intern and student researcher at Microsoft from 2024-2025 working with Ahmed Elgohary Ghoneim. I have collaborated with Mark Dredze at JHU CLSP, Yulia Tsvetkov and Tianxing He at the University of Washington, and Jim Glass at MIT CSAIL. I completed my B.S. also from JHU with majors in Computer Science, Mathematics, Applied Mathematics, and minor in Economics. GO HOP! 💙🤍

I’m always excited about collaborations. If you are interested in working together, please feel free to drop me an email: jzhan237[at]jhu.edu!

Selected Works

	The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Jingyu Zhang, Haozhu Wang, Eric Michael Smith, Sid Wang, Amr Sharaf, Mahesh Pasupuleti, Benjamin Van Durme, Daniel Khashabi, Jason Weston, Hongyuan Zhan. arXiv preprint We introduce WaltzRL, a multi-agent RL framework that frames LLM safety as a positive-sum game between a conversation agent and a feedback agent. We introduce a novel Dynamic Improvement Reward to jointly train two agents to collaborate, and give feedback adaptively at inference. WaltzRL improves safety & reduces overrefusals without degrading general capabilities.
	Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme. ICLR 2025 The current paradigm for safety alignment of large language models (LLMs) follows a one-size-fits-all approach and lacks flexibility in the face of varying social norms across cultures, and diverse user needs. We propose Controllable Safety Alignment, a framework that adapt models to diverse safety requirements without re-training.
	Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi. NAACL 2025 (oral) To trust the fluent generations of large language models, humans must be able to verify their correctness against trusted external sources. We trivialize the verification process by developing models that quote verbatim statements from trusted sources in their pre-training data.
	SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov. NAACL 2024* Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences.

All Publications

Hoang Phan, Xianjun Yang, Kevin Yao, Jingyu Zhang, Shengjie Bi, Xiaocheng Tang, Madian Khabsa, Lijuan Liu, Deren Lei. Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models. arXiv preprint.

Jingyu Zhang, Haozhu Wang, Eric Michael Smith, Sid Wang, Amr Sharaf, Mahesh Pasupuleti, Benjamin Van Durme, Daniel Khashabi, Jason Weston, Hongyuan Zhan. The Alignment Waltz: Jointly Training Agents to Collaborate for Safety. arXiv preprint.

Jingyu Zhang, Ahmed Elgohary, Xiawei Wang, A S M Iftekhar, Ahmed Magooda, Benjamin Van Durme, Daniel Khashabi, Kyle Jackson. Jailbreak Distillation: Renewable Safety Benchmarking. Findings of EMNLP 2025.

Jingyu Zhang, Jiacan Yu, Marc Marone, Benjamin Van Durme, Daniel Khashabi. Certified Mitigation of Worst-Case LLM Copyright Infringement. EMNLP 2025.

Abe Bohan Hou, Hongru Du, Yichen Wang, Jingyu Zhang, Zixiao Wang, Paul Pu Liang, Daniel Khashabi, Lauren Gardner, Tianxing He. Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy. COLM 2025

Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme. Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements. ICLR 2025.

Dongwei Jiang, Guoxuan Wang, Yining Lu, Andrew Wang, Jingyu Zhang, Chuyu Liu, Benjamin Van Durme, Daniel Khashabi. Rationalyst: Pre-training Process-Supervision for Improving Reasoning. ACL 2025.

Zhengping Jiang, Jingyu Zhang, Nathaniel Weir, Seth Ebner, Miriam Wanner, Kate Sanders, Daniel Khashabi, Anqi Liu, Benjamin Van Durme. Core: Robust Factual Precision Scoring with Informative Sub-Claim Identification. Findings of ACL 2025.

Dongwei Jiang, Jingyu Zhang, Orion Weller, Nathaniel Weir, Benjamin Van Durme, Daniel Khashabi. Self-(In)Correct: LLMs Struggle with Refining Self-Generated Responses. AAAI 2025.

Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi. Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data. NAACL 2025 (oral).

Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jingyu Zhang, Benjamin Van Durme, Daniel Khashabi. TurkingBench: A Challenge Benchmark for Web Agents. NAACL 2025.

Weiting Tan, Jingyu Zhang, Lingfeng Shen, Daniel Khashabi, Philipp Koehn. DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation. NeurIPS 2024.

Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He. k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text. Findings of ACL 2024.

Lingfeng Shen, Weiting Tan, Sihao Chen, Yunmo Chen, Jingyu Zhang, Haoran Xu, Boyuan Zheng, Philipp Koehn, Daniel Khashabi. The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts. Findings of ACL 2024.

Abe Bohan Hou*, Jingyu Zhang*, Tianxing He*, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov. SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation. NAACL 2024.

Xiao Pu, Jingyu Zhang, Xiaochuang Han, Yulia Tsvetkov, Tianxing He. On the Zero-Shot Generalization of Machine-Generated Text Detectors. Findings of EMNLP 2023.

Tianxing He*, Jingyu Zhang*, Tianle Wang, Sachin Kumar, Kyunghyun Cho, James Glass, Yulia Tsvetkov. On the Blind Spots of Model-Based Evaluation Metrics for Text Generation. ACL 2023 (oral).

Jingyu Zhang, Alexandra DeLucia, Chenyu Zhang, Mark Dredze. Geo-Seq2seq: Twitter User Geolocation on Noisy Data through Sequence to Sequence Learning. Findings of ACL 2023.

Jingyu Zhang, James Glass, Tianxing He. PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation. *SEM 2023. Preliminary version accepted at 2nd Workshop on Efficient Natural Language and Speech Processing (ENLSP), NeurIPS 2022. Best Paper Award.

Jingyu Zhang, Alexandra DeLucia, Mark Dredze. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. Proceedings of the 8th Workshop on Noisy User-generated Text (W-NUT), COLING 2022.

Abhinav Chinta*, Jingyu Zhang*, Alexandra DeLucia, Anna L. Buzcak, Mark Dredze. Study of Manifestation of Civil Unrest on Twitter. Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT), EMNLP 2021.

*Equal Contribution

Teaching

I was a course assistant for EN.601.465/665: Natural Language Processing, taught by Jason Eisner, in Fall 2022 and Fall 2021.

I was a section leader for Code in Place 2021, hosted by Stanford University.

Service

Application Mentor, JHU CLSP pre-application support program
Curriculum Committee, Department of Computer Science, Johns Hopkins University
Recruitment Committee, Center for Language and Speech Processing, Johns Hopkins University

Misc

🏎️🏎️🏎️ In my free time, I enjoy go-karting and sim racing. I’m a big car enthusiast and love watching motorsports such as formula 1. My favorite driver is Zhou Guanyu, the first ever Chinese driver to compete in F1.