Efficient Deep Learning and Embodiment Group

Efficient Deep Learning and Embodiment Group

Research Group

MMLab@SIGS

Research Directions

Efficient Deep Learning
Efficient Embodied AI
Vision-Language-Action Models

About

Welcome to the Efficient Deep Learning and Embodiment Group at MMLab@SIGS.

Our group focuses on developing efficient deep learning methods and their applications in embodied AI systems. We aim to bridge the gap between powerful deep learning models and real-world deployment on resource-constrained platforms, including robots and edge devices.

Our research spans the following core areas:

  • Efficient Embodied AI: efficient Vision-Language-Action (VLA) models, efficient action diffusion models, efficient embodied world models
  • Efficient Deep Learning: model compression, model pruning, sparse inference, neural architecture search, efficient training strategies

We are actively looking for self-motivated students and researchers to join our team. If you are interested, please feel free to contact us.

Selected Publications

View All

Test-time Sparsity for Extreme Fast Action Diffusion

Kangye Ji, Yuan Meng, Jianbo Zhou, Ye Li, Chen Tang, Zhi Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

A test-time sparsity method for accelerating action diffusion models to achieve extreme fast inference.

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Jiajun Fan, Xinzhu Ma, Shutao Xia, Zhi Wang, Wenwu Zhu

International Conference on Learning Representations (ICLR)

A joint model scheduling and token pruning approach for accelerating Vision-Language-Action models.

Block-wise Adaptive Caching for Accelerating Diffusion Policy

Kangye Ji, Yuan Meng, Hanyun Cui, Ye Li, Jianbo Zhou, Shengjia Hua, Lei Chen, Zhi Wang

International Conference on Learning Representations (ICLR)

A block-wise adaptive caching approach for accelerating diffusion policy inference.

Prance: Joint token-optimization and structural channel-pruning for adaptive vit inference

Ye Li, Chen Tang, Yuan Meng, Jiajun Fan, Zenghao Chai, Xinzhu Ma, Zhi Wang, Wenwu Zhu

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

A joint token-optimization and structural channel-pruning approach for adaptive vit inference.

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Addresses large channel/dimension variance and step-dependent activation shifts in DiT quantization; proposes automatic granularity assignment and dynamic quantization. Achieves lossless W6A8 and about 20% lower FID than prior SOTA.

News

2026-02

One paper accepted by CVPR 2026 🎉

2026-01

Two papers accepted by ICLR 2026 🎉

2025-10

One paper accepted by TPAMI 🎉