Xiuyu Li

I am a Ph.D. student affiliated with Berkeley AI Research (BAIR) at UC Berkeley, advised by Prof. Kurt Keutzer. Previously, I received a B.A. in Computer Science and Math from Cornell University. During my undergrad years, I was fortunate to work with Prof. Zhiru Zhang, Prof. Vitaly Shmatikov, and Prof. Song Han.

Email: xiuyu [at] berkeley [dot] edu

   /      /      /      

Research


My current research interests are enhancing the reasoning capabilities of large language models (LLMs) and developing scalable AI agents. Broadly, my research focuses on improving the efficiency of LLMs, vision-language models (VLMs), and diffusion models from both algorithmic and systems perspectives.
Reasoning and Test-time Scaling: APR (arXiv'25) unlocks a new dimension of scaling in LLMs via parallel reasoning. S* (arXiv'25) is an effective test-time scaling framework for code generation.
Long-context LLMs/VLMs: STORM (arXiv'25) and NVILA (CVPR'25) propose efficient VLM architectures for long video understanding. LLoCO (EMNLP'24) improves long-context LLMs via context compression and parameter-efficient finetuning.
Efficient Generative Models (Quantization & Sparsity): SparseLoRA (ICML'25) speeds up LLM finetuning with contextual sparsity. Q-Diffusion (ICCV'23) and SVDQuant (ICLR'25) are pioneering works for diffusion models quantization. SVG (ICML'25) accelerates video generation speed by 2x via attention sparsity. SqueezeLLM (ICML'24) achieves near-lossless 3-bit quantization for LLMs.
ML Systems: LongVILA (ICLR'25) is a framework for distributed training of VLMs on hour-long videos. TorchSparse (MLSys'22, MICRO'23) is a high-performance CUDA library for sparse convolution.
Evaluation: RouterBench (ICML'24 Agentic Markets) is the first benchmark for LLM routing. ArtBench (arXiv'22) is high-quality dataset for artwork generation. LINKX (NeurIPS'21) offers diverse large-scale non-homophilous graph datasets with a strong baseline.

Selected Publications

For the most up-to-date list of publications, please see google scholar.
* indicates co-first author indicates project lead

Learning Adaptive Parallel Reasoning with Language Models
Jiayi Pan*, Xiuyu Li*, Long Lian*, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr
Preprint, 2025
[abs]  [paper]  [code

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Samir Khaki*, Xiuyu Li*, Junxian Guo*, Ligeng Zhu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu
ICML, 2025
[absMore details and links coming soon!

S*: Test Time Scaling for Code Generation
Dacheng Li*, Shiyi Cao*, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica
Preprint, 2025
[abs]  [paper]  [code

Token-Efficient Long Video Understanding for Multimodal LLMs
Jindong Jiang*, Xiuyu Li*, Zhijian Liu, Muyang Li, Guo Chen, Zhiqi Li, De-An Huang, Guilin Liu, Zhiding Yu, Kurt Keutzer, Sungjin Ahn, Jan Kautz, Hongxu Yin, Yao Lu, Song Han, Wonmin Byeon
Preprint, 2025
[abs]  [paper]  [website

LLoCO: Learning Long Contexts Offline
Sijun Tan*, Xiuyu Li*, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa
EMNLP, 2024
[abs]  [paper]  [code

Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, Kurt Keutzer
ICCV, 2023
[abs]  [paper]  [code]  [website]  [talk
Integration: NVIDIA TensorRT

SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim*, Coleman Hooper*, Amir Gholami*, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer
ICML, 2024
[abs]  [paper]  [code
Integration: Intel oneAPI


TorchSparse: Efficient Point Cloud Inference Engine
Haotian Tang*, Zhijian Liu*, Xiuyu Li*, Yujun Lin, Song Han
MLSys, 2022
[abs]  [paper [code]  [website]


Talks


S*: Test Time Scaling for Code Generation
LLoCO: Learning Long Contexts Offline
Q-Diffusion: Quantizing Diffusion Models [slides]

Projects


Mamba-2.8B-Zephyr
🤗 Model Loading...

A Mamba-2.8B model finetuned with DPO. It is one of the most downloaded Mamba models on Hugging Face.