News
I'm on the job market and looking for a Research Scientist/Engineer or Postdoc position starting from 2024 summer. Feel free to contact me if you have any opening!
[2024/02] 2 papers (one first author) are accepted by CVPR2024!
[2023/08] Our EqBen is accepted by ICCV 2023 (Oral)!
[2023/07] Join Google Research for a summer internship!
[2023/06] Propose a novel framework for human dance image/video synthesis with demo, code and a bunch of new research directions!
[2023/03] We release a new benchmark for diagnosing the modern large VL model with a one-stop toolkit
[2022/09] Awarded Google PhD Fellowship! Thanks, Google!
[2022/08] Received 2022 PREMIA Best Student Paper Awards (The Gold Award)!
[2022/07] 1 paper is accepted by ECCV 2022.
[2022/06] Start summer research internship at Microsoft@Seattle on Vision & Language Pretrained Models.
[2022/04] We host the NICO Challenge 2022 for real-world OOD (Out-of-Distribution) generalization problem. Stay tuned!
[2022/03] 1 paper is accepted by CVPR 2022.
[2021/10] Jury Prize in ICCV 2021 VIPriors Challenge.
[2021/09] 1 paper (Spotlight) is accepted by NeurIPS 2021.
[2021/07] 1 paper is accepted by ICCV 2021.
[2021/03] 1 paper on ZSL/OSR is accepted by CVPR 2021.
[2020/04] 2 Journal papers are accepted by TNNLS 2020.
[2020/02] 1 paper with Prof. Hanwang Zhang is accepted by CVPR 2020.
[2019/07] 1 paper with Prof. Alan Hanjalic is accepted by ACM MM 2019 Oral.
|
|
University of Electronic Science and Technology of China (UESTC), China
Honours Degree in Electronic Information Engineering • Sep. 2016 - Jun. 2020
GPA: 92.98/100, Ranking: 2/284 (Overall) or 1/415 (first 2 years)
Supervisors: Prof. Xing Xu and Prof. Yang Yang. Collaborated with Prof. Alan Hanjalic
|
|
Chiba University, Japan
Exchange Program • Aug. 2017
Sakura Science Club Scholarship awardee. Funded by Japan Science and Technology Agency (JST).
|
|
Nanyang Technological University (NTU), Singapore
Second-year Ph.D. in MreaL Lab, School of Computer Science and Engineering • Aug. 2020 - Jun. 2024
Supervisor: Prof. Zhang Hanwang
|
|
Center For Future Media, UESTC
Research Assistant • Mar. 2018 - Jun. 2020
Advisors: Prof. Xing Xu and Prof. Yang Yang. Collaborated with Prof. Alan Hanjalic
|
|
MReal Lab, NTU
Research Assistant • July. 2019 - Aug. 2020
Advisors: Prof. Hanwang Zhang
|
|
Micorsoft Research, Redmond WA (Remote)
Research Intern • Jun. 2022 - Jun. 2023
Advisors: Kevin Lin, Lindsey Li
|
|
Google Research, Zurich
Research Intern • Jul. 2023 - Oct. 2023
|
|
DisCo: Disentangled Control for Referring Human Dance Generation in Real World
Tan Wang*,
Lindsey Li*,
Kevin Lin*,
Yuanhao Zhai,
Chung-Ching Lin,
Zhengyuan Yang,
Hanwang Zhang,
Zicheng Liu,
Lijuan Wang
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2024
[Paperlink], [Code], [Project Page], [Demo], [Video]
Area: Human-centric Generative Model, ControlNet
We introduce DisCo, which includes a novel model architecture with disentangled control to improve the faithfulness and compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans.
|
|
Enhance Image Classification Via Inter-Class Image Mixup With Diffusion Model
Zhicai Wang,
Longhui Wei,
Tan Wang,
Heyu Chen,
Yanbin Hao,
Xiang Wang,
Xiangnan He,
Qi Tian
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2024
Area: Generative Model, Image Classification
We propose a novel inter-class data augmentation method, Diff-Mix with diffusion model. Diff-Mix conduct image translation in an inter-class manner, significantly improving the diversity of synthetic data and maintain faithfulness well, resulting in a significant performance gain across various image classification settings.
|
|
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang,
Kevin Lin,
Lindsey Li,
Chung-Ching Lin,
Zhengyuan Yang,
Hanwang Zhang,
Zicheng Liu,
Lijuan Wang
IEEE International Conference on Computer Vision, ICCV 2023 (Oral)
[Paperlink], [Slides], [Code], [Eval Page]
Area: Vision-Langauge Model, Similarity Measure, New Benchmark
This study explores the concept of equivariance for the similarity measure of vision-language models (VLMs). We propose a novel benchmark named EqBen (Equivariant Benchmark) to benchmark VLMs with visual-minimal change samples, and a plug-and-play regularization loss EqSim (Equivariant Similarity Learning) to improve the equivariance of current VLMs.
|
|
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
Tan Wang,
Qianru Sun,
Sugiri Pranata,
Karlekar Jayashree,
Hanwang Zhang
European Conference on Computer Vision, ECCV 2022
(Final Rating: 122)
[Paperlink], [Code], [Poster], [Slides]
Area: Efficient Learning, Visual Inductive Bias; OOD Generalization
We show why insufficient data renders the model more easily biased to the limited training environment, and propose to impose two "good" inductive biases: equivariance and invariance for robust feature learning.
|
|
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Zhaozheng Chen,
Tan Wang,
Xiongwei Wu,
Xian-Sheng Hua,
Hanwang Zhang,
Qianru Sun
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2022
[Paperlink], [Code]
Area: Weakly-Supervised Semantic Segmentation, CAM
We introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with Binary Cross Entropy loss (BCE) by using softmax cross-entropy loss (SCE), dubbed ReCAM.
|
|
Self-Supervised Learning Disentangled Group Representation as Feature
Tan Wang,
Zhongqi Yue,
Jianqiang Huang,
Qianru Sun,
Hanwang Zhang
Conference and Workshop on Neural Information Processing Systems, NeurIPS 2021
(Spotlight Presentation, Top 3%); (2022 PREMIA Best Student Paper)
[Paperlink],
[Code], [Poster], [Slides], [知乎]
Area: Self-supervised Representation Learning, Group Theory, Invariant Risk Minimization
We presented an unsupervised disentangled representation learning method called IP-IRM, based on Self-Supervised Learning (SSL). IP-IRM
iteratively partitions the dataset into semantic-related subsets, and learns a representation invariant across the subsets using SSL with an IRM loss.
|
|
Causal Attention for Unbiased Visual Recognition
Tan Wang,
Chang Zhou,
Qianru Sun,
Hanwang Zhang,
 
IEEE International Conference on Computer Vision, ICCV 2021
[Paperlink],
[Code], [Poster], [Slides]
Area: Invariant Risk Minimization, OOD Generalization
We propose a causal attention module (CaaM) that self-annotates the confounders in unsupervised fashion. In particular, multiple CaaMs can be stacked and integrated in conventional attention CNN and self-attention Vision Transformer.
|
|
Counterfactual Zero-Shot and Open-Set Visual Recognition
Zhongqi Yue*,
Tan Wang*,
Hanwang Zhang,
Qianru Sun,
Xian-sheng Hua   (* equal contribution)
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2021
[Paperlink],
[Code], [知乎]
Area: Counterfacual, Zero-shot Learning, Open-set Recognition
We presented a novel counterfactual framework "Generative Causal Model" for Zero-Shot Learning (ZSL) and Open-Set Recognition (OSR) to provide a theoretical ground for balancing and improving the seen/unseen classification imbalance.
|
|
Visual Commonsense R-CNN
Tan Wang,
Jianqiang Huang,
Hanwang Zhang,
Qianru Sun
IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2020
[Paperlink],
[Code], [知乎]
Area: Visual and Language, Causal Reasoning, Self-supervised Learning
In this paper, we present a novel un-/self-supervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for Vision & Language high-level tasks.
|
|
Visual Commonsense Representation Learning via Causal Inference (Abstact Version of VC R-CNN)
Tan Wang,
Jianqiang Huang,
Hanwang Zhang,
Qianru Sun
IEEE International Conference on Computer Vision and Pattern Recognition MVM Workshop, CVPRW 2020
(Oral Presentation)
[Paperlink], [Code], [知乎]
Area: Visual and Language, Causal Reasoning, Self-supervised Learning
|
|
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang,
Xing Xu,
Yang Yang,
Alan Hanjalic,
Heng Tao Shen
ACM International Conference on Multimedia, MM 2019
(Oral Presentation, 4.96% acceptance rate)
[Paperlink], [Code]
Area: Visual and Language, Image-text matching
In this paper, we propose a novel framework for image-text matching that achieves remarkable matching performance with acceptable model complexity and much less time consuming.
|
|
Cross-Modal Attention with Semantic Consistence for Image-Text Matching
Xing Xu*,
Tan Wang*,
Yang Yang,
Lin Zuo,
Fumin Shen,
Heng Tao Shen   (* equal contribution)
IEEE Transactions on Neural Networks and learning systems, TNNLS 2020
Area: Visual and Language, Image-text matching
In this paper, we propose a novel hybrid matching approach named Cross-modal Attention with Semantic Consistence (CASC) for image-text matching, which is a joint framework that performs cross-modal attention for local alignment and multi-label prediction for global semantic consistence.
|
|
Radial Graph Convolutional Network for Visual Question Generation
Xing Xu*,
Tan Wang*,
Yang Yang,
Alan Hanjalic,
Heng Tao Shen   (* equal contribution)
IEEE Transactions on Neural Networks and learning systems, TNNLS 2020
Area: Visual and Language, Image-text matching
We propose an innovative answer-centric approach termed Radial Graph Convolutional Network (Radial-GCN) to focus on the relevant image regions only to reduce the complexity on VQG task.
|
Academic Service
Co-organizer: NICO Challenge 2022 (ECCV'22 Workshop)
PC Member: CVPR'22, ECCV'22, AAAI'23, CVPR'23, ICCV'23, NeurIPS'23
Journal Reviewer: IEEE TNNLS, ACM ToMM
|
Talk
"DisCo and Beyond: Innovations and Challenges in Human Dance Synthesis, University of Washington (UW), 2024.01
"Equivariant Similarity for Vision-Language Foundation Models (VLM)", Microsoft, Reading Group, 2023.06
"Equivariant Similarity for Vision-Language Foundation Models (VLM)", KAUST, Rising Stars in AI Symposium, 2023.03
"Equivariance and Invariance Inductive Bias for Learning from Insufficient Data", JiangMen Talk, 2022.10
"Self-Supervised Learning Disentangled Group Representation as Feature", PREMIA AGM 2022, 2022.08
"Towards Out-of-Distribution Generalization in Computer Vision", National University of Singapore (NUS), 2022.04
"Disentangled Group Representation Learning and its Potential in Causality", ZhiYuan Community, 2022.01
"Generalization Powered by Invariant Learning", Singapore Management University (SMU), 2021.11
"因果推理的应用与发展 (中文)", AI Time, 2021.10  [Video]
"Visual Commonsense R-CNN", National University of Singapore (NUS), 2021.03
|
Honors & Scholarships
Google PhD Fellowship, 2022-2024
2022 PREMIA Best Student Paper Award (The Gold Award), 2022
Jury Prize in ICCV 2021 VIPriors Challenge, 2021
NTU Research Scholarship, 2020
Outstanding Graduates of Sichuan Province (Top 1% student), 2020  [Press Coverage]
Outstanding Undergraduate Thesis Award (Top 2% student), 2020
National Scholarship (Top 2% student), 2017, 2018
Tang Lixin Sponsored Elite Scholarship (Only 60 awardees pre year in UESTC), 2017
Best Freshman Award (Top 1 student per year in Department), 2016
Honor Student Scholarship (Top 10 students per year in Department), 2018
Outstanding Student Scholarship (Top 10% student), 2017~2019
|
|
Lecture Group of EE Department
Founder & President • Oct. 2017 - Sep. 2018
Organized academic forum, sharing sessions, Q&A meetings more than 30 times, serving over 1000 students on studying and future planing.
The team grows to 30 people and won the Outstanding Student Organisation prize in 2018.
|
|
Innovative Entrepreneurship Project of UESTC
Team Leader • Sep. 2017 - Mar. 2018
This project focus on the pedestrian detection in low-light condition with excellent conclusion. We combine the recent pedestrian detection models with the low-light image enhancement algorithm based on Laplace operator.
Responsible for the code implementation and project promotion.
|
Personal Interests
DOTA1: My first and most playing PC game which accompanied me in my whole middle and high school. And I got about 1350 score on the '11' Battle Platform Ladder Tournament. :)
Running: During my college, I offen run a long distance for the pleasure releasing. And I have participated in the Chengdu Shuangyi Marathon in 2018.
|
Last updated on Jul, 2023
This awesome template borrowed from this guy~
|
|