Han Hu 胡瀚

Distinguished Scientist

Tencent Hy

ancientmooner AT gmail DOT com

Short Bio

Han Hu is a Distinguished Scientist at Tencent, serving as the head of Hy Vision-Language Model (VLM) and Embodied Foundation Model. He previously served as the Tech Lead of the Hy Language Model and founded the Hy speech team. Before he joined Tencent Hy at December 2023, Han Hu was a principal researcher and research manager at Microsoft Research Asia (MSRA). He received his Ph.D and B.S. degrees from Tsinghua University from 2014 and 2008, respectively, under the supervision of Prof. Jie Zhou, with his doctoral thesis winning the China Association of Artificial Intelligence (CAAI)'s Doctoral Dissertation Award. He was a visiting student in University of Pennsylvania under supervision of Prof. Jianbo Shi from October, 2012 to April, 2013. Before he joined MSRA in Dec. 2016, he worked at Institute of Deep Learning (IDL), Baidu Research. His paper Swin Transofmrer won the ICCV2021 Best Paper Award (Marr Prize) and has been widely used in academia and industry. Besides, he has 6 papers ranking top-15 on the corresponding conferences (CVPR/ICCV) according to the PaperDigest Most Influential Papers. He is a recipient of 2022 MIT TR35 China (35 innovators under 35), due to the contribution to making machines see the visual world in the same way they understand language (see also a Chinese blog).

News

2023.10 FP8-LM save up to 40% memory consumption compared to Nvidia's Transformer Engine (TE) framework.
2023.9 Xwin-LM ranking as top-1 on the Alpaca-Eval benchmark.
2023.6 VALSE 2023 APR (Annual Progress Review) on Visual Self-suprevised Learning [PDF (in Chinese)].
2023.4 Awarded a 2022 MIT TR35 China recipient (35 innovators under 35), due to the contribution to making machines see the visual world in the same way they understand language (see also a Chinese blog)
2022.12 Will serve as an area chair of ICCV 2023
2022.8.22 VALSE 2022 APR (Annual Progress Review) on Vision Transformers (also including Transformer Decoders and Masked Image Modeling based Pre-training) [PDF (in Chinese)]. Co-organized the VALSE vision Transformer workshop.
2022.4.23 Co-organized 2022 China CVPR pre-conference as the program chair. A total of 15,000 online viewers watched the live, with a cumulative viewing time of more than 9,100 hours. [Calendar (in Chinese)] [Recorded Videos (in Chinese)]
2022.3 Swin Transformer V2, SimMIM and Video Swin Transformer got accepted by CVPR 2022.
2021.10 Swin Transformer won ICCV2021 Marr Prize (best paper award). Computer Vision News published a nice review in their BEST OF ICCV selection.
2021.10 A talk at VALSE2021 Hangzhou about Self-Supervised Learning in Computer Vision: Past, Present, Trends
2021.09 Three spotlight papers accepted by NeurIPS2021.
2021.08 Harry Shum Interview (in Chinese)
2021.07 Three papers with one oral accepted by ICCV2021.
2021.6.20 Co-organize the 3rd Tutorial on Learning Representations via Graph-structured Networks, in CVPR2021. Talk title: Swin Transformer and Five Reasons to use Transformer/Attention in Computer Vision [recorded video] [longer version in Chinese 中文]
2021.6.2 A talk at 2021 BAAI: Self-Supervised Learning in Computer Vision: Past, Present, Trends.
2021.5 Code available for Self-Supervised Learning with Swin Transformer.
Will serve as an area chair of CVPR2022.
2021.4 Slides used in recent talks: Toward Universal Models with NLP in Computer Vision.
2021.4 Code and pretrained models for Swin Transformer (object detection, semantic segmentation) is released.
2021.03 Swin Transformer achieves new SOTA on COCO detection (58.7 box AP and 51.1 mask AP) and ADE20K semantic segmentation (53.5 mIoU).
2021.03 Two papers with one oral accepted by CVPR2021. Code available for PixPro.
2020.11 Chaired the session of "self-supervised learning and transfer learning in vision" on China Pre-conference of NeurIPS2020, and made a talk titled Recent Progress on Self-Supervised Visual Representaion Learning.
2020.09 Three papers with one spotlight accepted by NeurIPS2020.
2020.07 Four papers accepted by ECCV2020.
2020.06.14 Co-organize the 2nd Tutorial on Learning Representations via Graph-structured Networks on CVPR2020. Talk title: Self-Attention Modeling for Visual Recognition [Recorded Video]
2020.01 Invited as an area chair of CVPR2021.
2019.07 A talk at a Valse Webinar, named Towards Universal Learning Machine: Self-Attention for Visual Modeling
GCNet received the best paper award at ICCV 2019 Neural Architects Workshop.
Code available for RepPoints.
Three papers accepted by ICCV 2019.
GCNet is merged into MMDetection.

Selected Publication

(^†Interns *Equal Contribution)

Full list: Google Scholar

Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu*^†, Han Hu*, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo
CVPR, 2022 [PDF] [Code@Github]
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu*
CVPR, 2022 [PDF] [Code@Github]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu*^†, Yutong Lin*^†, Yue Cao*, Han Hu*^‡, Yixuan Wei^†, Zheng Zhang, Stephen Lin, Baining Guo
ICCV, 2021 (^‡ Correspondence) [Arxiv] [Code@Github] Marr Prize (Best Paper Award)
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
Zhenda Xie*^†, Yutong Lin*^†, Zheng Zhang, Yue Cao, Stephen Lin, Han Hu
CVPR, 2021 [Arxiv] [Code@Github]
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
Yue Cao^†*, Jiarui Xu^†*, Stephen Lin, Fangyun Wei and Han Hu
In ICCV workshop on Neural Architects, 2019 [Arxiv] [Code] [@mmdet] Best Paper Award
Local Relation Networks for Image Recognition
Han Hu, Zheng Zhang, Zhenda Xie and Stephen Lin
In ICCV, 2019 [Arxiv]
RepPoints: Point Set Representation for Object Detection
Ze Yang^†*, Shaohui Liu^†*, Han Hu, Liwei Wang and Stephen Lin
In ICCV, 2019 [Arxiv] [Code]
Deformable ConvNets v2: More Deformable, Better Results
Xizhou Zhu^†, Han Hu, Stephen Lin and Jifeng Dai
In CVPR, 2019 [Arxiv]
Relation Networks for Object Detection
Han Hu*, Jiayuan Gu^†*, Zheng Zhang*, Jifeng Dai and Yichen Wei
In CVPR, 2018 [Arxiv] [PPT] [code] Oral
Deformable Convolutional Networks
Jifeng Dai*, Haozhi Qi^†*, Yuwen Xiong^†*, Yi Li^†*, Guodong Zhang^†*, Han Hu and Yichen Wei
In ICCV, 2017 [Arxiv] Oral
WordSup: Exploiting Word Annotations for Character based Text Detection
Han Hu*, Chengquan Zhang*, Yuxuan Luo, Yuzhuo Wang, Junyu Han and Errui Ding
In ICCV, 2017 [Arxiv]