Tan Wang      

Currently, Tan Wang is a final-year Ph.D. student at MReaL Lab of Nanyang Technological University (NTU), supervised by Prof. Zhang Hanwang. He also works closely with Prof. Qianru Sun from SMU. His research interests include but not limit to Visual Reasoning, Causal Inference and Vision & Language. He is a recipient of the Google PhD Fellowship 2022.

Before that, He obtained the honoured bachelor degree in Department of EIE from University of Electronic Science and Technology of China (UESTC) in 2020. He was a research assistant at Center for Future Media , supervised by Prof. Xing Xu and Prof. Yang Yang. He also had a close research collaboration with Prof. Alan Hanjalic at TU Delft.

Email  /  CV  /  Google Scholar  /  Github  /  Linkedin

News

  • I'm on the job market and looking for a Research Scientist/Engineer or Postdoc position starting from 2024 summer. Feel free to contact me if you have any opening!
  • [2024/02]   2 papers (one first author) are accepted by CVPR2024!
  • [2023/08]   Our EqBen is accepted by ICCV 2023 (Oral)!
  • [2023/07]   Join Google Research for a summer internship!
  • [2023/06]   Propose a novel framework for human dance image/video synthesis with demo, code and a bunch of new research directions!
  • [2023/03]   We release a new benchmark for diagnosing the modern large VL model with a one-stop toolkit
  • [2022/09]   Awarded Google PhD Fellowship! Thanks, Google!
  • [2022/08]   Received 2022 PREMIA Best Student Paper Awards (The Gold Award)!
  • [2022/07]   1 paper is accepted by ECCV 2022.
  • [2022/06]   Start summer research internship at Microsoft@Seattle on Vision & Language Pretrained Models.
  • [2022/04]   We host the NICO Challenge 2022 for real-world OOD (Out-of-Distribution) generalization problem. Stay tuned!
  • [2022/03]   1 paper is accepted by CVPR 2022.
  • [2021/10]   Jury Prize in ICCV 2021 VIPriors Challenge.
  • [2021/09]   1 paper (Spotlight) is accepted by NeurIPS 2021.
  • [2021/07]   1 paper is accepted by ICCV 2021.
  • [2021/03]   1 paper on ZSL/OSR is accepted by CVPR 2021.
  • [2020/04]   2 Journal papers are accepted by TNNLS 2020.
  • [2020/02]   1 paper with Prof. Hanwang Zhang is accepted by CVPR 2020.
  • [2019/07]   1 paper with Prof. Alan Hanjalic is accepted by ACM MM 2019 Oral.

  • Education

    University of Electronic Science and Technology of China (UESTC), China
    Honours Degree in Electronic Information Engineering      • Sep. 2016 - Jun. 2020
    GPA: 92.98/100,   Ranking: 2/284 (Overall) or 1/415 (first 2 years)
    Supervisors: Prof. Xing Xu and Prof. Yang Yang.    Collaborated with Prof. Alan Hanjalic

    Chiba University, Japan
    Exchange Program        • Aug. 2017
    Sakura Science Club Scholarship awardee. Funded by Japan Science and Technology Agency (JST).

    Nanyang Technological University (NTU), Singapore
    Second-year Ph.D. in MreaL Lab, School of Computer Science and Engineering      • Aug. 2020 - Jun. 2024
    Supervisor: Prof. Zhang Hanwang

    Research Experience

    Center For Future Media, UESTC
    Research Assistant       • Mar. 2018 - Jun. 2020
    Advisors:   Prof. Xing Xu and Prof. Yang Yang.   Collaborated with Prof. Alan Hanjalic

    MReal Lab, NTU
    Research Assistant       • July. 2019 - Aug. 2020
    Advisors:   Prof. Hanwang Zhang

    Micorsoft Research, Redmond WA (Remote)
    Research Intern       • Jun. 2022 - Jun. 2023
    Advisors:   Kevin Lin, Lindsey Li

    Google Research, Zurich
    Research Intern       • Jul. 2023 - Oct. 2023

    Publication [Google Scholar]
    DisCo: Disentangled Control for Referring Human Dance Generation in Real World
    Tan Wang*, Lindsey Li*, Kevin Lin*, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
    IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2024
    [Paperlink], [Code], [Project Page], [Demo], [Video]
    Area: Human-centric Generative Model, ControlNet

    We introduce DisCo, which includes a novel model architecture with disentangled control to improve the faithfulness and compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans.

    Enhance Image Classification Via Inter-Class Image Mixup With Diffusion Model
    Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian
    IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2024
    Area: Generative Model, Image Classification

    We propose a novel inter-class data augmentation method, Diff-Mix with diffusion model. Diff-Mix conduct image translation in an inter-class manner, significantly improving the diversity of synthetic data and maintain faithfulness well, resulting in a significant performance gain across various image classification settings.

    Equivariant Similarity for Vision-Language Foundation Models
    Tan Wang, Kevin Lin, Lindsey Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
    IEEE International Conference on Computer Vision, ICCV 2023 (Oral)
    [Paperlink], [Slides], [Code], [Eval Page]
    Area: Vision-Langauge Model, Similarity Measure, New Benchmark

    This study explores the concept of equivariance for the similarity measure of vision-language models (VLMs). We propose a novel benchmark named EqBen (Equivariant Benchmark) to benchmark VLMs with visual-minimal change samples, and a plug-and-play regularization loss EqSim (Equivariant Similarity Learning) to improve the equivariance of current VLMs.

    Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
    Tan Wang, Qianru Sun, Sugiri Pranata, Karlekar Jayashree, Hanwang Zhang
    European Conference on Computer Vision, ECCV 2022
    (Final Rating: 122)
    [Paperlink], [Code], [Poster], [Slides]
    Area: Efficient Learning, Visual Inductive Bias; OOD Generalization

    We show why insufficient data renders the model more easily biased to the limited training environment, and propose to impose two "good" inductive biases: equivariance and invariance for robust feature learning.

    Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
    Zhaozheng Chen, Tan Wang, Xiongwei Wu, Xian-Sheng Hua, Hanwang Zhang, Qianru Sun
    IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2022
    [Paperlink], [Code]
    Area: Weakly-Supervised Semantic Segmentation, CAM

    We introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with Binary Cross Entropy loss (BCE) by using softmax cross-entropy loss (SCE), dubbed ReCAM.

    Self-Supervised Learning Disentangled Group Representation as Feature
    Tan Wang, Zhongqi Yue, Jianqiang Huang, Qianru Sun, Hanwang Zhang
    Conference and Workshop on Neural Information Processing Systems, NeurIPS 2021
    (Spotlight Presentation, Top 3%); (2022 PREMIA Best Student Paper)
    [Paperlink], [Code], [Poster], [Slides], [知乎]
    Area: Self-supervised Representation Learning, Group Theory, Invariant Risk Minimization

    We presented an unsupervised disentangled representation learning method called IP-IRM, based on Self-Supervised Learning (SSL). IP-IRM iteratively partitions the dataset into semantic-related subsets, and learns a representation invariant across the subsets using SSL with an IRM loss.

    Causal Attention for Unbiased Visual Recognition
    Tan Wang, Chang Zhou, Qianru Sun, Hanwang Zhang,  
    IEEE International Conference on Computer Vision, ICCV 2021
    [Paperlink], [Code], [Poster], [Slides]
    Area: Invariant Risk Minimization, OOD Generalization

    We propose a causal attention module (CaaM) that self-annotates the confounders in unsupervised fashion. In particular, multiple CaaMs can be stacked and integrated in conventional attention CNN and self-attention Vision Transformer.

    Counterfactual Zero-Shot and Open-Set Visual Recognition
    Zhongqi Yue*, Tan Wang*, Hanwang Zhang, Qianru Sun, Xian-sheng Hua   (* equal contribution)
    IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2021
    [Paperlink], [Code], [知乎]
    Area: Counterfacual, Zero-shot Learning, Open-set Recognition

    We presented a novel counterfactual framework "Generative Causal Model" for Zero-Shot Learning (ZSL) and Open-Set Recognition (OSR) to provide a theoretical ground for balancing and improving the seen/unseen classification imbalance.

    Visual Commonsense R-CNN
    Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
    IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2020
    [Paperlink], [Code], [知乎]
    Area: Visual and Language, Causal Reasoning, Self-supervised Learning

    In this paper, we present a novel un-/self-supervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for Vision & Language high-level tasks.

    Visual Commonsense Representation Learning via Causal Inference (Abstact Version of VC R-CNN)
    Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
    IEEE International Conference on Computer Vision and Pattern Recognition MVM Workshop, CVPRW 2020
    (Oral Presentation)
    [Paperlink], [Code], [知乎]
    Area: Visual and Language, Causal Reasoning, Self-supervised Learning
    Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
    Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen
    ACM International Conference on Multimedia, MM 2019
    (Oral Presentation, 4.96% acceptance rate)
    [Paperlink], [Code]
    Area: Visual and Language, Image-text matching

    In this paper, we propose a novel framework for image-text matching that achieves remarkable matching performance with acceptable model complexity and much less time consuming.

    Cross-Modal Attention with Semantic Consistence for Image-Text Matching
    Xing Xu*, Tan Wang*, Yang Yang, Lin Zuo, Fumin Shen, Heng Tao Shen   (* equal contribution)
    IEEE Transactions on Neural Networks and learning systems, TNNLS 2020
    Area: Visual and Language, Image-text matching

    In this paper, we propose a novel hybrid matching approach named Cross-modal Attention with Semantic Consistence (CASC) for image-text matching, which is a joint framework that performs cross-modal attention for local alignment and multi-label prediction for global semantic consistence.

    Radial Graph Convolutional Network for Visual Question Generation
    Xing Xu*, Tan Wang*, Yang Yang, Alan Hanjalic, Heng Tao Shen   (* equal contribution)
    IEEE Transactions on Neural Networks and learning systems, TNNLS 2020
    Area: Visual and Language, Image-text matching

    We propose an innovative answer-centric approach termed Radial Graph Convolutional Network (Radial-GCN) to focus on the relevant image regions only to reduce the complexity on VQG task.

    Academic Service

  • Co-organizer:   NICO Challenge 2022 (ECCV'22 Workshop)
  • PC Member:   CVPR'22, ECCV'22, AAAI'23, CVPR'23, ICCV'23, NeurIPS'23
  • Journal Reviewer:   IEEE TNNLS, ACM ToMM

  • Talk

  • "DisCo and Beyond: Innovations and Challenges in Human Dance Synthesis, University of Washington (UW), 2024.01
  • "Equivariant Similarity for Vision-Language Foundation Models (VLM)", Microsoft, Reading Group, 2023.06
  • "Equivariant Similarity for Vision-Language Foundation Models (VLM)", KAUST, Rising Stars in AI Symposium, 2023.03
  • "Equivariance and Invariance Inductive Bias for Learning from Insufficient Data", JiangMen Talk, 2022.10
  • "Self-Supervised Learning Disentangled Group Representation as Feature", PREMIA AGM 2022, 2022.08
  • "Towards Out-of-Distribution Generalization in Computer Vision", National University of Singapore (NUS), 2022.04
  • "Disentangled Group Representation Learning and its Potential in Causality", ZhiYuan Community, 2022.01
  • "Generalization Powered by Invariant Learning", Singapore Management University (SMU), 2021.11
  • "因果推理的应用与发展 (中文)", AI Time, 2021.10  [Video]
  • "Visual Commonsense R-CNN", National University of Singapore (NUS), 2021.03

  • Honors & Scholarships

  • Google PhD Fellowship,  2022-2024
  • 2022 PREMIA Best Student Paper Award (The Gold Award),  2022
  • Jury Prize in ICCV 2021 VIPriors Challenge,  2021
  • NTU Research Scholarship,  2020
  • Outstanding Graduates of Sichuan Province (Top 1% student),  2020  [Press Coverage]
  • Outstanding Undergraduate Thesis Award (Top 2% student),  2020
  • National Scholarship (Top 2% student),  2017, 2018
  • Tang Lixin Sponsored Elite Scholarship (Only 60 awardees pre year in UESTC),  2017
  • Best Freshman Award (Top 1 student per year in Department),  2016
  • Honor Student Scholarship (Top 10 students per year in Department),  2018
  • Outstanding Student Scholarship (Top 10% student),  2017~2019

  • Project

    Leadership Experience
    Lecture Group of EE Department
    Founder & President       • Oct. 2017 - Sep. 2018

  • Organized academic forum, sharing sessions, Q&A meetings more than 30 times, serving over 1000 students on studying and future planing.
  • The team grows to 30 people and won the Outstanding Student Organisation prize in 2018.

  • Innovative Entrepreneurship Project of UESTC
    Team Leader       • Sep. 2017 - Mar. 2018

  • This project focus on the pedestrian detection in low-light condition with excellent conclusion. We combine the recent pedestrian detection models with the low-light image enhancement algorithm based on Laplace operator.
  • Responsible for the code implementation and project promotion.

  • Personal Interests

    DOTA1: My first and most playing PC game which accompanied me in my whole middle and high school. And I got about 1350 score on the '11' Battle Platform Ladder Tournament. :)

    Running: During my college, I offen run a long distance for the pleasure releasing. And I have participated in the Chengdu Shuangyi Marathon in 2018.


    Last updated on Jul, 2023

    This awesome template borrowed from this guy~