📝 Publications

For the most up-to-date publication list, please visit my Google Scholar profile. (* indicates equal contribution)

ACL 2026 Main
SAC placeholder image

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

Wenxi Chen, Xinsheng Wang, Ruiqi Yan, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiquan Li, Yuzhe Liang, Hanlin Wen, Shunshun Yin, Ming Tao, Xie Chen

Paper Demo Code

  • A dual-stream speech codec that achieves both high-quality reconstruction and rich semantic representations for advanced speech generation.
ACL 2025 Findings
SLAM-Omni placeholder image

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, Yanqiao Zhu, Yifan Yang, Zhanxun Liu, Kai Yu, Yuxuan Hu, Jinyu Li, Yan Lu, Shujie Liu, Xie Chen

Paper Demo Code

  • A high-efficiency, end-to-end voice interaction system that enables zero-shot timbre control and accelerated inference.
ICASSP 2025
SLAM-AAC placeholder image

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs

Wenxi Chen*, Ziyang Ma*, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen

Paper Code

  • Enhancing audio captioning through paraphrasing-based data augmentation and a plug-and-play CLAP-based rescoring strategy for LLM-driven generation.
IJCAI 2024
EAT placeholder image

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen

Paper Code

  • An audio self-supervised learning model achieving superior representational performance with extreme pre-training efficiency.
  • EAT have surpassed 1M total model downloads on Hugging Face!