Minghui Hu

胡明辉

Currently, I am a Ph.D Candidate at Nanyang Technological University in Singapore, advised by Prof. P. N. Suganthan. Previousely, I received my MSc. degree in Electric and Electrical Engineering from NTU in 2019. Parallel to my doctoral studies, I hold a position as a researcher at Temasek Lab @ NTU, under the supervision of Dr. Sirajudeen s/o Gulam Razul.

I've had the distinct privilege of collaborating with Prof. T.J.Cham from NTU, Dr. Chuanxia Zheng from VGG, University of Oxford and Dr. Chaoyue Wang from University of Sydney. I also had memorable experiences as a research intern Sensetime Research, JD Explore Academy and MiniMax.

Email  /  Google Scholar  /  Github  /  LinkedIn

profile photo
Research

My research focuses on generative models, multi-modality learning, and its applications in many domains, particularly 2D Image Generation. Prior to this, I was working on a network model with a simple topology named randomized neural networks.

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu, Jianbin Zheng, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham
arxiv, 2023  
project page / arXiv / code / HF Space

We develop a versatile plug-and-play module to fix the scheduler flaws for diffusion models.

Cocktail🍸: Mixing Multi-Modality Controls for Text-Conditional Image Generation
Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham
NeurIPS, 2023  
project page / arXiv / code

We develop a generalized HypreNetwork for multi-modality control based on text-to-image generative model.

MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis
Jianbin Zheng, Daqing Liu, Chaoyue Wang, Minghui Hu, Zuopeng Yang, Changxing Ding, Dacheng Tao,
arxiv, 2023  
project page / arXiv / PDF

We introduce a Mixture-of-Modality-Tokens Transformer (MMoT) that adaptively fuses fine-grained multimodal control signals for multi-modality image generation.

Versatile LiDAR-Inertial Odometry with SE(2) Constraints for Ground Vehicles
Jiayang Chen, Han Wang, Minghui Hu, P.N.Suganthan,
RA-L & IROS, 2023  
IEEE

We propose a hybrid LiDAR-inertial SLAM framework that leverages both the on-board perception system and prior information such as motion dynamics to improve localization performance.

Class-Incremental Learning on Multivariate Time Series Via Shape-Aligned Temporal Distillation
Zhongzheng Qiao, Minghui Hu, Xudong Jiang, P.N.Suganthan, Ramasamy Savitha,
ICASSP, 2023  
IEEE

We propose to exploit Soft-Dynamic Time Warping (Soft-DTW) for knowledge distillation, which aligns the feature maps along the temporal dimension before calculating the discrepancy.

Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Minghui Hu, Chuanxia Zheng, Zuopeng Yang, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, Dacheng Tao, P.N.Suganthan
ICLR, 2023  
project page / arXiv / PDF

We construct a unified discrete diffusion model for simultaneous vision-language generation.

Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation
Minghui Hu, Yujie Wang, Tat-Jen Cham, Jianfei Yang, P.N.Suganthan
CVPR, 2022  
arXiv / PDF

Instead of AutoRegresive Transformers, we use Discrete Diffusion Model to obtain a better global context for image generation.

Academic Services

Conference Program Committee Member

CVPR    2022, 2023, 2024
ICCV    2023
NeurIPS    2023
ICLR    2023, 2024
ICASSP    2023; 2024
IJCNN    2020; 2021; 2022, 2023

Journal Reviewer

PR, TNNLS, TCyb, NeuNet, Neucom, ASOC, EAAI, IJCV


Yep it's another Jon Barron website.
Last updated Sep. 2023.