About
Welcome to the Efficient Deep Learning and Embodiment Group at MMLab@SIGS.
Our group focuses on developing efficient deep learning methods and their applications in embodied AI systems. We aim to bridge the gap between powerful deep learning models and real-world deployment on resource-constrained platforms, including robots and edge devices.
Our research spans the following core areas:
- Efficient Embodied AI: efficient Vision-Language-Action (VLA) models, efficient action diffusion models, efficient embodied world models
- Efficient Deep Learning: model compression, model pruning, sparse inference, neural architecture search, efficient training strategies
We are actively looking for self-motivated students and researchers to join our team. If you are interested, please feel free to contact us.
Selected Publications
View All →Test-time Sparsity for Extreme Fast Action Diffusion
Kangye Ji, Yuan Meng, Jianbo Zhou, Ye Li, Chen Tang, Zhi Wang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A test-time sparsity method for accelerating action diffusion models to achieve extreme fast inference.
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Jiajun Fan, Xinzhu Ma, Shutao Xia, Zhi Wang, Wenwu Zhu
International Conference on Learning Representations (ICLR)
A joint model scheduling and token pruning approach for accelerating Vision-Language-Action models.
Block-wise Adaptive Caching for Accelerating Diffusion Policy
Kangye Ji, Yuan Meng, Hanyun Cui, Ye Li, Jianbo Zhou, Shengjia Hua, Lei Chen, Zhi Wang
International Conference on Learning Representations (ICLR)
A block-wise adaptive caching approach for accelerating diffusion policy inference.
Prance: Joint token-optimization and structural channel-pruning for adaptive vit inference
Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, Wenwu Zhu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
A joint token-optimization and structural channel-pruning approach for adaptive vit inference.
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Addresses large channel/dimension variance and step-dependent activation shifts in DiT quantization; proposes automatic granularity assignment and dynamic quantization. Achieves lossless W6A8 and about 20% lower FID than prior SOTA.
News
One paper accepted by CVPR 2026 🎉
Two papers accepted by ICLR 2026 🎉
One paper accepted by TPAMI 🎉
