wangpichao

Welcome to visit Pichao Wang's homepage

View the Project on GitHub

Pichao Wang, PhD, Amazon AGI Foundations

AI 2000 Most Influential Scholar

World’s Top 1% Scientist named by Stanford University

CVPR 2022 Best Student Paper Recipient

Email: pichaowang@gmail.com Goolge Scholar ResearchGate Linkedin

Recent News

  1. 2024-09: One paper about text-to-video generation was accepted by NeurIPS 2024
  2. 2024-09: One paper about diffusion-based text-to-video retrieval was accepted by NeurIPS 2024
  3. 2024-09: One paper about MLLM for video segmentation was accepted by NeurIPS 2024
  4. 2024-07: One paper about interpretable image recognition was accepted by ACM MM 2024
  5. 2024-07: One paper about camouflaged instance segmentation was accepted by ACM MM 2024
  6. 2024-02: One paper about text-to-video retreival was accepted by CVPR2024
  7. 2024-02: One paper about 3D pose estimation was accepted by CVPR2024
  8. 2024-01: One invited paper was accepted by TPAMI
  9. 2024-01: One paper about action recognition was accepted by ESWA
  10. 2023-09: One paper about large model finetuning was accepted by IJCV
  11. 2023-07: One paper about RGB-D action recognition was accepted by ACM MM 2023
  12. 2023-07: One paper about vision transformer was accepted by ICCV 2023
  13. 2023-07: One paper about text-to-video retrieval was accepted by ICCV 2023
  14. 2023-05: One paper about vision transformer was accepted by IJCV
  15. 2023-04: One paper about RGB+D action recognition was accepted by TPAMI
  16. 2023-04: One paper about 3D pose estimation was accepted by Pattern Recognition
  17. 2023-02: One paper about efficient vision transformer was accepted by CVPR2023
  18. 2023-02: One paper about long form video understanding was accepted by CVPR2023
  19. 2023-02: One paper about 3D pose estimation was accepted by CVPR2023
  20. 2022-12: Our TIP’21 paper received the IEEE Finland SP/CAS Best Paper Award
  21. 2022-11: One paper about Semantic Segmentation was accepted by AAAI2023
  22. 2022-11: One paper about Neural Style Transfer was accepted by AAAI2023
  23. 2022-09: One paper about skeleton action recognition was accepted by ACCV 2022
  24. 2022-09: One paper about vision transformer compression was accepted by NeurIPS 2022
  25. 2022-07: One paper about vision transformer was accepted by ECCV 2022
  26. 2022-07: One paper about unsupervised semantic segmentation was accepted by ECCV 2022
  27. 2022-06: Best Student Paper Award in CVPR 2022.
  28. 2022-03: One paper about 3D human pose estimation was accepted by CVPR 2022.
  29. 2022-03: One paper about 3D object detection was accepted by CVPR 2022.
  30. 2022-03: One paper about RGB+D motion recognition was accepted by CVPR 2022.
  31. 2022-01: One paper about knowledge distillation was accepted by ICASSP 2022.
  32. 2022-01: One paper about unsupervised domain adaption was accepted by ICLR 2022.
  33. 2021-12: One paper about pose estimation was accepted by IEEE TMM.
  34. 2021-12: One paper about vision transformer training was accepted by AAAI 2022.
  35. 2021-07: One paper about Object ReID was accepted by ICCV 2021.
  36. 2021-07: One paper about Zero-Shot NAS was accepted by ICCV 2021.
  37. 2021-06: One paper about video object detection was accepted by IJCV.
  38. 2021-06: One paper about video object detection was accepted by IEEE TCSVT.

Biography

I am a senior research scientist at Amazon AGI Foundations. Before I joined Amazon, I worked as a staff/senior engineer at DAMO Academy, Alibaba Group (U.S.) for more than 4 years. I received my Ph.D in Computer Science from University of Wollongong, Australia, in Oct. 2017, supervised by Prof. Wanqing Li and Prof. Philip Ogunbona. I received my M.E. in Information and Communication Engineering from Tianjin University, China, in 2013, supervised by Prof. Yonghong Hou, and B.E. in Network Engineering from Nanchang University, China, in 2010.

Research Interests

Computer Vision · Multimedia · Deep Learning · Image Representation · Video Understanding

Selected Awards and Honors

  1. Apr. 2024, The Tony Stark Award of Prime Video

  2. Oct. 2023, World’s Top 1% Scientist

  3. Jun.2022, Best Student Paper Award @CVPR2022

  4. Jan.2022, AI 2000 Most Influential Scholars certificate

  5. Oct.2021, World’s Top 2% Scientists

  6. Jun. 2020, Second Prize, Multiple Object Tracking and Segmentation@CVPR2020

  7. May. 2018, EIS Faculty Postgraduate Thesis Award.

  8. Aug. 2017, Second Prize, Action, Gesture, and Emotion Recognition Workshop and Competitions: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions@ICCV2017

  9. Apr. 2017 First Prize (Winner), Large Scale 3D Human Activity Analysis Challenge in Depth Video@ICME2017

  10. Dec. 2016 Second Prize, Joint Contest on Multimedia Challenges Beyond Visual Analysis@ICPR2016

  11. Dec. 2016 Third Prize, Joint Contest on Multimedia Challenges Beyond Visual Analysis@ICPR2016

  12. Jan. 2013 Excellent Postgraduate Award

  13. Dec. 2011 Excellent Prize, National Campus CUDA Programming Contest. certificate

Publications

Ph.D. Dissertation

Action Recognition from RGB-D Data. The University of Wollongong, 2017. (Best Postgraduate Thesis Award) link

Preprint (selected papers, full paper list)

  1. Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, Rong Jin, “Self-Supervised Pre-Training for Transformer-Based Person Re-Identification”, arXiv 2021. paper. code

Conference Papers (selected papers, full paper list)

  1. Penghui Ruan, Pichao Wang, Divya Saxena, Jiannong Cao, Yuhui Shi, “Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning”, NeurIPS 2024.

  2. Jiamian Wang, Pichao Wang, Dongfang Liu, Qiang Guan, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao, “Diffusion-Inspiered Truncated Sampler for Text-Video Retrieval”, NeurIPS 2024.

  3. Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou, “One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos”, NeurIPS 2024.

  4. Jiaqi Wang, Pichao Wang, Yi Feng, Huafeng Liu, Chang Gao, Liping Jing, “Align2Concept: Language Guided Interpretable Image Recognition by Visual Prototype and Textural Concept Alignment”, ACM MM 2024.

  5. Bo Dong, Pichao Wang, Hao Luo, Fan Wang, “Adaptive Query Selection for Camouflaged Instance Segmentation”, ACM MM 2024.

  6. Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqinag Tao, “Text is MASS: Modelling as Stochasitc Embedding for Text-to-Video Retrieval”, CVPR 2024 (Highlight).paper code

  7. Wenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Jialun Cai, and Nicu Sebe, “Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation”, CVPR 2024 (Highlight). paper code

  8. Yujun Ma, Benjia Zhou, Ruili Wang, Pichao WANG, “Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition”, ACM MM 2023.

  9. Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, and Mike Zheng Shou, “Revisiting Vision Transformer from the View of Path Ensemble”, Oral, ICCV 2023.

  10. Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar, “Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment”, Oral, ICCV 2023.

  11. Shuning Chang, Pichao Wang, Ming Lin, Fan Wang, David Junhao Zhang, Rong Jin and Mike Zheng Shou,(first two authors make equal contributions), “Making Vision Transformers Efficient from A Token Sparsification View”, CVPR 2023. paper code

  12. Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen, “PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation”, CVPR 2023 paper code

  13. Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, and Raffay Hamid, “Selective Structured State-Spaces for Long-Form Video Understanding”, CVPR2023 paper

  14. Bo Dong, Pichao Wang@, Fan Wang,(@ Corresponding author), “Head-Free Lightweight Semantic Segmentation with Linear Transformer”, AAAI 2023.paper code

  15. Dongyang Li, Hao Luo, Pichao Wang, Zhibin Wang, Shang Liu, Fan Wang, “Frequency Domain Disentanglement for Arbitrary Neural Style Transfer”, AAAI 2023.

  16. Zhenyu Wang, Hao Luo, Pichao Wang, Feng Ding, Fan Wang, Hao Li, “VTC-LFC: Vision Transformer Compression with Low-Frequency Components”, NeurIPS 2022.paper code

  17. Pichao Wang, Xue Wang, Fan Wang, Ming Lin, Shuning Chang, Hao Li, Rong Jin, (first two authors make equal contributions), “KVT: k-NN Attention for Boosting Vision Transformers”, ECCV 2022. paper. code

  18. Zhaoyuan Yin, Pichao Wang@, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li, Rong Jin,(@ Corresponding author), “TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation”, ECCV 2022, Oral(2.7% of submitted papers) paper. code

  19. Benjia Zhou, Pichao Wang@, Jun Wan, Yanyan Liang, Fan Wang, Du Zhang, Zhen Lei, Hao Li, Rong Jin, (@ Corresponding author), “Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition”, CVPR 2022. paper. code

  20. Hansheng Chen, Pichao Wang@, Fan Wang, Wei Tian, Lu Xiong, Hao Li, (@ Corresponding author), “EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation”, CVPR 2022, Best Student Paper Award. paper code

  21. Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc Van Gool, “MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation”, CVPR 2022. paper. code

  22. Pichao Wang, Fan Wang, Hao Li, “Image-to-Video Re-Identification via Mutual Discriminative Knowledge Transfer”, ICASSP 2022. paper

  23. Tongkun Xu, Weihua Chen, Pichao Wang, Fan Wang, Hao Li, Rong Jin, “Cdtrans: Cross-domain transformer for unsupervised domain adaptation”, ICLR 2022. paper. code

  24. Pichao Wang, Xue Wang, Hao Luo, Jingkai Zhou, Zhipeng Zhou, Fan Wang, Hao Li, and Rong Jin,(first two authors make equal contributions), “Scaled relu matters for training vision transformers”, AAAI 2022. paper. video

  25. Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang, “TransReid: Transformer-based Object Re-identification”,ICCV 2021. paper. code

  26. Min Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, and Rong Jin, “Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition”, ICCV 2021. paper. code

  27. Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang, and Hao Li, (first two authors make equal contributions) “Exploiting Better Feature Aggregation for Video Object Detection”, ACM MM 2020. paper

  28. Chang Tang, Xinwang Liu, Xinzhong Zhu, En Zhu, Kun Sun, Pichao Wang, Lizhe Wang and Albert Zomaya, “R2MRF: Defocus Blur Detection via Recurrently Refining Multi-scale Residual Features”, AAAI 2020.paper. code

  29. Pichao Wang, Wanqing Li, Jun Wan, Philip Ogunbona, and Xinwang Liu, “Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition”, AAAI 2018, ORAL paper. code

  30. Huogen Wang, Pichao Wang, Zhanjie Song, and Wanqing Li, (first two authors make equal contributions) “Large-scale Multimodal Gesture Recognition Using Heterogeneous Networks”, ICCV 2017.paper. code

  31. Huogen Wang, Pichao Wang, Zhanjie Song, and Wanqing Li, (first two authors make equal contributions) “Large-scale Multimodal Gesture Segmentation and Recognition based on Convolutional Neural Network”, ICCV 2017. paper. code

  32. Pichao Wang, Wanqing Li, Zhimin Gao, Yuyao Zhang, Chang Tang, and Philip Ogunbona, “Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks”, CVPR 2017. paper

  33. Pichao Wang, Zhaoyang Li, Yonghong Hou, and Wanqing Li, (first two authors make equal contributions) “Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks”, ACM MM 16. paper. code

  34. Pichao Wang, Wanqing Li, Zhimin Gao, Chang Tang, Jing Zhang, and Philip Ogunbona,”ConvNets-Based Action Recognition from Depth Maps Through Virtual Cameras and Pseudocoloring”, ACM MM 15. paper. code

Journal Articles (selected papers, full paper list)

  1. Hansheng Chen, Wei Tian, Pichao Wang, Fan Wang, Lu Xiong, Hao Li, “EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation”, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024, paper code

  2. Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou, “SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels”, International Journal of Computer Vision (IJCV), 2023. paper. code

  3. Jingkai Zhou, Pichao Wang@, Jiasheng Tang, Fan Wang, Qiong Liu, Hao Li, Rong Jin,(@project lead), “What limits the performance of local self-attention?”, International Journal of Computer Vision (IJCV), 2023. paper code

  4. Benjia Zhou, Pichao Wang@, Jun Wan, Liangliang Yan, and Fan Wang, (@corresponding auther), “A Unified Multimodal De-and Re-coupling Framework for RGB-D Motion Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023. paper code

  5. Wenhao Li, Hong Liu, Hao Tang, and Pichao Wang, “Multi-Hypothesis Representation Learning for Transformer-Based 3D Human Pose Estimation”, Pattern Recognition, 2023

  6. Wenhao Li, Hong Liu, Runwei Ding, Mengyuan Liu, Pichao Wang, and Wenming Yang, “Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation”, IEEE Transactions on Multimedia, 2021. paper. code

  7. Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang, and Hao Li, (first two authors make equal contributions), “Context and Structure Mining Network for Video Object Detection”, International Journal of Computer Vision (IJCV), 2021. paper

  8. Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang, and Hao Li, (first two authors make equal contributions), “Class-aware Feature Aggregation Network for Video Object Detection”, IEEE Transactions on Circuits and Systems for Video Technology, 2021. paper

  9. Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z Li, and Guoying Zhao, “Searching Multi-Rate and Multi-Modal Temporal Enhanced Network for Gesture Recognition”, IEEE Transaction on Image Processing, 2021. paper. code

  10. Xiangyu Li, Yonghong Hou, Pichao Wang@, Zhimin Gao, Mingliang Xu, and Wanqing Li,(@ Corresponding author), “Trear: Tranformer-based RGB-D Egocentric Action Recognition”, IEEE Transactions on Cognitive and Developmental System, 2021. paper

  11. Chang Tang, Xinwang Liu, Shan An, and Pichao Wang, “BR2NET: Defocus Blur Detection via Bidirectional Channel Attention Residual Refining Network”, IEEE Transactions on Multimedia, 2020. paper

  12. Chang Tang, Xinwang Liu, Pichao Wang, Changqing Zhang, Miaomiao Li and Lizhe Wang,“Adaptive Hypergraph Embedded Semi-supervised Multi-label Image Annotation” IEEE Transactions on Multimedia, 2019. paper

  13. Chang Tang, Xinzhong Zhu, Xinwang Liu, Miaomiao Li, Pichao Wang, Changqing Zhang and Lizhe Wang, “Learning Joint Affinity Graph for Multi-view Subspace Clustering”, IEEE Transactions on Multimedia, 2019. paper

  14. Chuankun Li, Yonghong Hou, Pichao Wang@, and Wanqing Li, (@Corresponding author), “Multi-view Based 3D Action Recognition Using Deep Networks”, IEEE Transactions on Human Machine Systems, 2018. paper

  15. Chang Tang, Wanqing Li, Pichao Wang@, and Lizhe Wang, (@ Corresponding author), “Online Human Action Recognition Based on Incremental Learning of Weighted Covariance Descriptors”, Information Sciences, 2018. code

  16. Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan and Sergio Escalera, “RGB-D-based Human Motion Recognition with Deep Learning: A Survey “, Computer Vision and Image Understanding, 2018.

  17. Pichao Wang, Wanqing Li, Zhimin Gao, Chang Tang, and Philip Ogunbona, “Depth Pooling Based Large-scale 3D Action Recognition with Deep Convolutional Neural Networks”, IEEE Transactions on Multimedia, 2018. paper. code

  18. Pichao Wang, Wanqing Li, Chuankun Li, and Yonghong Hou, “Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks”, Knowledge-Based Systems,2018. paper. code

  19. Yonghong Hou, Zhaoyang Li, Pichao Wang@ and Wanqing Li, (@ Corresponding author), “Skeleton Optical Spectra Based Action Recognition Using Convolutional Neural Networks”, IEEE Transactions on Circuits and Systems for Video Technology, 2016. code

  20. Jing Zhang, Wanqing Li, Philip Ogunbona, Pichao Wang and Chang Tang, “RGB-D based Action Recognition Datasets: A Survey”, Pattern Recognition, 2016.

  21. Pichao Wang, Wanqing Li, Zhimin Gao, Jing Zhang, Chang Tang, and Philip Ogunbona, “Action Recognition from Depth Maps Using Deep Convolutional Neural Networks”, IEEE Transactions on Human Machine Systems, 2016. code

Academic Activities

Editorial Works:

  1. Associate Editor, Computer Engineering(«计算机工程», Chinese Journal), 2019-2024
  2. Eiditorial Board of Young Scientists, Journal of Computer Science and Technology (JCST) (Tier 1, CCF B), 2022.7.1-2024.6.30
  3. Area Chair, ICME, 2021,2022: Area Chair for Multimedia Analysis and Understanding (main area)

Selected Invited Journal Reviewer:

  1. IEEE Transactions on Pattern Analysis and Machine Intelligence
  2. IEEE Transactions on Image Processing
  3. IEEE Transactions on Circuits and Systems for Video Technology
  4. IEEE Transactions on Cybernetics
  5. IEEE Transactions on Neural Networks and Learning Systems
  6. IEEE Transactions on Industrial Information
  7. IEEE Transactions on Audio, Speech and Language Processing
  8. IEEE Transactions on Multimedia
  9. ACM Transactions on Interactive Intelligent Systems
  10. ACM Transactions on Multimedia Computing, Communications and Applications

Conference Technical Program Committee Member:

  1. ICCV2017,2019,2021,2023
  2. CVPR2018,2019,2020,2021,2022,2023,2024
  3. ICME2018,2019,2020,2021,2022
  4. IJCAI2018,2019,2020,2021
  5. ACCV2018,2020
  6. WACV2019,2020,2021
  7. AAAI2019,2020,2021,2022
  8. ECCV2020,2022,2024
  9. NIPS2020,2021,2022,2023
  10. ICML2021,2022
  11. ICLR2022,2023,2024

Work Experience

  1. 2018.9-2022.10: I was employed as a staff/senior algorithm engineer, and conducted research on various computer vision tasks.

  2. 2017.10-2018.6: I was employed as a researcher at Motovis Inc, and I was in charge of Fixed-point quantization networks, pixel-level semantic labeling, intelligent headlight control.

  3. 2013.07-2013.11: I was employed as a Software Engineer at Beijing Hanze Technology Co., ltd and I was in charge of the development of software about video enhancement, including FFMpeg video decoding, video enhancement algorithms, denoising algorithms, and H.264 coding by CUDA.

  4. 2011.05-2011.12: I was employed as a Software Engineer at Beijing Maystar Information Technology Co., ltd , and I was in charge of decrypting the Office documents based on GPU.

  5. 2010.07-2011.06: I participated a National High-tech R&D Program (863 Program) project at Institute of Wideband Wireless Communication and 3D Imaging (IWWC&3DI): Multi-view video acquisition and demonstration system (2009AA011507). I was in charge of adaptive definition adjustment and format conversion in 3D video network and implemented the 3D video combination algorithm using paralleled methods based on CUDA.

Datasets

  1. FT-HID Dataset: The dataset contains more than 38K RGB samples, 38K depth samples, and about 20K skeleton sequences. 30 classes of daily actions are designed specically for multi-person interaction with a wearable device and three fixed cameras. FT-HID dataset has a comparable number of data, action classes, and scenes with other RGB-D action recognition datasets. It is more complex as the data is collected from 109 distinct subjects with large variations in gender, age, and physical condition. More importantly, to the best of our knowledge, it is the first large-scale RGB-D dataset that is collected from both TPV and FPV perspectives for action recognition. Please cite the following papers if you use the dataset: Zihui Guo, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu and Wanqing Li, “FT-HID: A Large Scale RGB-D Dataset for First and Third Person Human Interaction Analysis”, Neural Computing and Applications, 2022. paper code

  2. UOW Online Action3D Dataset: this dataset consists of action sequences of skeleton videos, the 20 actions are from the original MSR Action3D Dataset. The action videos are recorded by Microsoft Kinect V.2 with average 20fms/s frame rate. There are 20 participants to perform these actions, every participant performs each action according to his/her personal habits. For each participant, he/she first repeats each action 3–5 times, then performs 20 actions continuously in a random order. These continuous action sequences can be used for online action recognition testing. The repeated action sequences will be used for training. In order to make the dataset can be used for cross dataset test, the 20 participants perform the actions in 4 different environments.Please cite the following papers if you use the dataset:
    Chang Tang, Wanqing Li, Pichao Wang, Lizhe Wang, “Online Human Action Recognition Based on Incremental Learning of Weighted Covariance Descriptors”, Information Sciences,vol.467,pp.219-237, 2018. paper. code