Wenxi Chen

📝 Publications

For the most up-to-date publication list, please visit my Google Scholar profile. (* indicates equal contribution)

ACL 2026 Main

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

Wenxi Chen, Xinsheng Wang, Ruiqi Yan, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiquan Li, Yuzhe Liang, Hanlin Wen, Shunshun Yin, Ming Tao, Xie Chen

Paper Demo Code

A dual-stream speech codec that achieves both high-quality reconstruction and rich semantic representations for advanced speech generation.

ACL 2025 Findings

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, Yanqiao Zhu, Yifan Yang, Zhanxun Liu, Kai Yu, Yuxuan Hu, Jinyu Li, Yan Lu, Shujie Liu, Xie Chen

Paper Demo Code

A high-efficiency, end-to-end voice interaction system that enables zero-shot timbre control and accelerated inference.

ICASSP 2025

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs

Wenxi Chen^*, Ziyang Ma^*, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen

Paper Code

Enhancing audio captioning through paraphrasing-based data augmentation and a plug-and-play CLAP-based rescoring strategy for LLM-driven generation.

IJCAI 2024

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen

Paper Code

An audio self-supervised learning model achieving superior representational performance with extreme pre-training efficiency.
EAT have surpassed 1M total model downloads on Hugging Face!

ACL 2025 Main SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation, Keqi Deng, Wenxi Chen, Xie Chen, Phil Woodland.
EMNLP 2025 Findings URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models, Ruiqi Yan, Xiquan Li, Wenxi Chen, Zhikang Niu, Chen Yang, Ziyang Ma, Kai Yu, Xie Chen.
ICASSP 2025 Oral DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning, Xiquan Li, Wenxi Chen, Ziyang Ma, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Qiuqiang Kong, Xie Chen.