Tan Wang

Tan Wang is an AI Research Scientist at Meta, specializing in video generative modeling, especially for creative and personalized advertising. Tan Wang earned his Ph.D. from Nanyang Technological University (NTU), supervised by Prof. Zhang Hanwang. He also works closely with Prof. Qianru Sun from SMU. During PhD, he interned at Microsoft, Google and Meta. His research interests include but not limit to Generative Model, Causal Inference and Vision & Language. He is a recipient of the Google PhD Fellowship 2022.

Before his doctoral studies, He obtained the honoured bachelor degree in Department of EIE from University of Electronic Science and Technology of China (UESTC) in 2020. He was a research assistant at Center for Future Media , supervised by Prof. Xing Xu and Prof. Yang Yang.

Email / CV / Google Scholar / Github / Linkedin

News

[2024/02] 2 papers (one first author) are accepted by CVPR2024!

[2023/08] Our EqBen is accepted by ICCV 2023 (Oral)!

[2023/07] Join Google Research for a summer internship!

[2023/06] Propose a novel framework for human dance image/video synthesis with demo, code and a bunch of new research directions!

[2023/03] We release a new benchmark for diagnosing the modern large VL model with a one-stop toolkit

[2022/09] Awarded Google PhD Fellowship! Thanks, Google!

[2022/08] Received 2022 PREMIA Best Student Paper Awards (The Gold Award)!

[2022/07] 1 paper is accepted by ECCV 2022.

[2022/06] Start summer research internship at Microsoft@Seattle on Vision & Language Pretrained Models.

[2022/04] We host the NICO Challenge 2022 for real-world OOD (Out-of-Distribution) generalization problem. Stay tuned!

[2022/03] 1 paper is accepted by CVPR 2022.

[2021/10] Jury Prize in ICCV 2021 VIPriors Challenge.

[2021/09] 1 paper (Spotlight) is accepted by NeurIPS 2021.

[2021/07] 1 paper is accepted by ICCV 2021.

[2021/03] 1 paper on ZSL/OSR is accepted by CVPR 2021.

[2020/04] 2 Journal papers are accepted by TNNLS 2020.

[2020/02] 1 paper with Prof. Hanwang Zhang is accepted by CVPR 2020.

[2019/07] 1 paper with Prof. Alan Hanjalic is accepted by ACM MM 2019 Oral.

Education

	University of Electronic Science and Technology of China (UESTC), China Honours Degree in Electronic Information Engineering • Sep. 2016 - Jun. 2020 GPA: 92.98/100, Ranking: 2/284 (Overall) or 1/415 (first 2 years) Supervisors: Prof. Xing Xu and Prof. Yang Yang. Collaborated with Prof. Alan Hanjalic
	Chiba University, Japan Exchange Program • Aug. 2017 Sakura Science Club Scholarship awardee. Funded by Japan Science and Technology Agency (JST).
	Nanyang Technological University (NTU), Singapore Second-year Ph.D. in MreaL Lab, School of Computer Science and Engineering • Aug. 2020 - Jun. 2024 Supervisor: Prof. Zhang Hanwang

Research Experience

Center For Future Media, UESTC
Research Assistant • Mar. 2018 - Jun. 2020
Advisors: Prof. Xing Xu and Prof. Yang Yang. Collaborated with Prof. Alan Hanjalic

MReal Lab, NTU
Research Assistant • July. 2019 - Aug. 2020
Advisors: Prof. Hanwang Zhang

Micorsoft Research, Redmond WA (Remote)
Research Intern • Jun. 2022 - Jun. 2023
Advisors: Kevin Lin, Lindsey Li

Google Research, Zurich
Research Intern • Jul. 2023 - Oct. 2023

Meta AI
Research Scientist Intern • Dec. 2023 - May 2024

Publication [Google Scholar]

DisCo: Disentangled Control for Referring Human Dance Generation in Real World
Tan Wang*, Lindsey Li*, Kevin Lin*, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2024
[Paperlink], [Code], [Project Page], [Demo], [Video]
Area: Human-centric Generative Model, ControlNet

We introduce DisCo, which includes a novel model architecture with disentangled control to improve the faithfulness and compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans.

Enhance Image Classification Via Inter-Class Image Mixup With Diffusion Model
Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2024
Area: Generative Model, Image Classification

We propose a novel inter-class data augmentation method, Diff-Mix with diffusion model. Diff-Mix conduct image translation in an inter-class manner, significantly improving the diversity of synthetic data and maintain faithfulness well, resulting in a significant performance gain across various image classification settings.

Equivariant Similarity for Vision-Language Foundation Models
Tan Wang, Kevin Lin, Lindsey Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
IEEE International Conference on Computer Vision, ICCV 2023 (Oral)
[Paperlink], [Slides], [Code], [Eval Page]
Area: Vision-Langauge Model, Similarity Measure, New Benchmark

This study explores the concept of equivariance for the similarity measure of vision-language models (VLMs). We propose a novel benchmark named EqBen (Equivariant Benchmark) to benchmark VLMs with visual-minimal change samples, and a plug-and-play regularization loss EqSim (Equivariant Similarity Learning) to improve the equivariance of current VLMs.

Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
Tan Wang, Qianru Sun, Sugiri Pranata, Karlekar Jayashree, Hanwang Zhang
European Conference on Computer Vision, ECCV 2022
(Final Rating: 122)
[Paperlink], [Code], [Poster], [Slides]
Area: Efficient Learning, Visual Inductive Bias; OOD Generalization

We show why insufficient data renders the model more easily biased to the limited training environment, and propose to impose two "good" inductive biases: equivariance and invariance for robust feature learning.

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Zhaozheng Chen, Tan Wang, Xiongwei Wu, Xian-Sheng Hua, Hanwang Zhang, Qianru Sun
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2022
[Paperlink], [Code]
Area: Weakly-Supervised Semantic Segmentation, CAM

We introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with Binary Cross Entropy loss (BCE) by using softmax cross-entropy loss (SCE), dubbed ReCAM.

Self-Supervised Learning Disentangled Group Representation as Feature
Tan Wang, Zhongqi Yue, Jianqiang Huang, Qianru Sun, Hanwang Zhang
Conference and Workshop on Neural Information Processing Systems, NeurIPS 2021
(Spotlight Presentation, Top 3%); (2022 PREMIA Best Student Paper)
[Paperlink], [Code], [Poster], [Slides], [知乎]
Area: Self-supervised Representation Learning, Group Theory, Invariant Risk Minimization

We presented an unsupervised disentangled representation learning method called IP-IRM, based on Self-Supervised Learning (SSL). IP-IRM iteratively partitions the dataset into semantic-related subsets, and learns a representation invariant across the subsets using SSL with an IRM loss.

Causal Attention for Unbiased Visual Recognition
Tan Wang, Chang Zhou, Qianru Sun, Hanwang Zhang,
IEEE International Conference on Computer Vision, ICCV 2021
[Paperlink], [Code], [Poster], [Slides]
Area: Invariant Risk Minimization, OOD Generalization

We propose a causal attention module (CaaM) that self-annotates the confounders in unsupervised fashion. In particular, multiple CaaMs can be stacked and integrated in conventional attention CNN and self-attention Vision Transformer.

Counterfactual Zero-Shot and Open-Set Visual Recognition
Zhongqi Yue*, Tan Wang*, Hanwang Zhang, Qianru Sun, Xian-sheng Hua (* equal contribution)
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2021
[Paperlink], [Code], [知乎]
Area: Counterfacual, Zero-shot Learning, Open-set Recognition

We presented a novel counterfactual framework "Generative Causal Model" for Zero-Shot Learning (ZSL) and Open-Set Recognition (OSR) to provide a theoretical ground for balancing and improving the seen/unseen classification imbalance.

Visual Commonsense R-CNN
Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2020
[Paperlink], [Code], [知乎]
Area: Visual and Language, Causal Reasoning, Self-supervised Learning

In this paper, we present a novel un-/self-supervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for Vision & Language high-level tasks.

Visual Commonsense Representation Learning via Causal Inference (Abstact Version of VC R-CNN)
Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
IEEE International Conference on Computer Vision and Pattern Recognition MVM Workshop, CVPRW 2020
(Oral Presentation)
[Paperlink], [Code], [知乎]
Area: Visual and Language, Causal Reasoning, Self-supervised Learning

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen
ACM International Conference on Multimedia, MM 2019
(Oral Presentation, 4.96% acceptance rate)
[Paperlink], [Code]
Area: Visual and Language, Image-text matching

In this paper, we propose a novel framework for image-text matching that achieves remarkable matching performance with acceptable model complexity and much less time consuming.

Cross-Modal Attention with Semantic Consistence for Image-Text Matching
Xing Xu*, Tan Wang*, Yang Yang, Lin Zuo, Fumin Shen, Heng Tao Shen (* equal contribution)
IEEE Transactions on Neural Networks and learning systems, TNNLS 2020
Area: Visual and Language, Image-text matching

In this paper, we propose a novel hybrid matching approach named Cross-modal Attention with Semantic Consistence (CASC) for image-text matching, which is a joint framework that performs cross-modal attention for local alignment and multi-label prediction for global semantic consistence.

Radial Graph Convolutional Network for Visual Question Generation
Xing Xu*, Tan Wang*, Yang Yang, Alan Hanjalic, Heng Tao Shen (* equal contribution)
IEEE Transactions on Neural Networks and learning systems, TNNLS 2020
Area: Visual and Language, Image-text matching

We propose an innovative answer-centric approach termed Radial Graph Convolutional Network (Radial-GCN) to focus on the relevant image regions only to reduce the complexity on VQG task.

Academic Service

Co-organizer: NICO Challenge 2022 (ECCV'22 Workshop)

PC Member: CVPR'22, ECCV'22, AAAI'23, CVPR'23, ICCV'23, NeurIPS'23

Journal Reviewer: IEEE TNNLS, ACM ToMM

Talk

"DisCo and Beyond: Innovations and Challenges in Human Dance Synthesis, University of Washington (UW), 2024.01

"Equivariant Similarity for Vision-Language Foundation Models (VLM)", Microsoft, Reading Group, 2023.06

"Equivariant Similarity for Vision-Language Foundation Models (VLM)", KAUST, Rising Stars in AI Symposium, 2023.03

"Equivariance and Invariance Inductive Bias for Learning from Insufficient Data", JiangMen Talk, 2022.10

"Self-Supervised Learning Disentangled Group Representation as Feature", PREMIA AGM 2022, 2022.08

"Towards Out-of-Distribution Generalization in Computer Vision", National University of Singapore (NUS), 2022.04

"Disentangled Group Representation Learning and its Potential in Causality", ZhiYuan Community, 2022.01

"Generalization Powered by Invariant Learning", Singapore Management University (SMU), 2021.11

"因果推理的应用与发展 (中文)", AI Time, 2021.10 [Video]

"Visual Commonsense R-CNN", National University of Singapore (NUS), 2021.03

Honors & Scholarships

Google PhD Fellowship, 2022-2024

2022 PREMIA Best Student Paper Award (The Gold Award), 2022

Jury Prize in ICCV 2021 VIPriors Challenge, 2021

NTU Research Scholarship, 2020

Outstanding Graduates of Sichuan Province (Top 1% student), 2020 [Press Coverage]

Outstanding Undergraduate Thesis Award (Top 2% student), 2020

National Scholarship (Top 2% student), 2017, 2018

Tang Lixin Sponsored Elite Scholarship (Only 60 awardees pre year in UESTC), 2017

Best Freshman Award (Top 1 student per year in Department), 2016

Honor Student Scholarship (Top 10 students per year in Department), 2018

Outstanding Student Scholarship (Top 10% student), 2017~2019

Project

Leadership Experience

Lecture Group of EE Department
Founder & President • Oct. 2017 - Sep. 2018

Organized academic forum, sharing sessions, Q&A meetings more than 30 times, serving over 1000 students on studying and future planing.

The team grows to 30 people and won the Outstanding Student Organisation prize in 2018.

Innovative Entrepreneurship Project of UESTC
Team Leader • Sep. 2017 - Mar. 2018

This project focus on the pedestrian detection in low-light condition with excellent conclusion. We combine the recent pedestrian detection models with the low-light image enhancement algorithm based on Laplace operator.

Responsible for the code implementation and project promotion.

Personal Interests

DOTA1: My first and most playing PC game which accompanied me in my whole middle and high school. And I got about 1350 score on the '11' Battle Platform Ladder Tournament. :)

Running: During my college, I offen run a long distance for the pleasure releasing. And I have participated in the Chengdu Shuangyi Marathon in 2018.

Last updated on Jul, 2023

This awesome template borrowed from this guy~