The 19th International Conference on
Document Analysis and
Recognition
September 16-21, 2025 Wuhan, Hubei, China
The 19th International Conference on
Document Analysis and
Recognition
September 16-21, 2025 Wuhan, Hubei, China
Tutorial 1: Beyond Recognition: A Multidimensional Exploration of Characters
Organizer: Seiichi Uchida (Kyushu University, Japan), uchida@ait.kyushu-u.ac.jp
In the field of document image recognition and understanding, characters are an indispensable element. Traditionally, much research has focused primarily on recognizing characters — treating them simply as objects of recognition — and on extracting semantic information from text when integrating with natural language processing. However, we seldom pause to ponder a fundamental question: What exactly is a character?
This tutorial aims to investigate that question from multiple perspectives, shedding light on how character images differ from generic images, how fonts are used and why so many different fonts are necessary. We will also explore why we are able to identify certain shapes as specific characters (e.g., recognizing an “A” as the letter A) and examine the unique properties that distinguish handwritten text from printed text. Moreover, we will discuss how characters appear in various contexts — from printed documents to building surfaces and inscriptions — and the cultural and historical dimensions surrounding them.
By exploring these facets, participants will gain fresh insights into the fundamental nature of textual information. This broader perspective may lead to innovative approaches in document image analysis and open new avenues for research and practical applications within the ICDAR community. Rather than focusing on incremental improvements to existing tasks, I hope this tutorial will inspire entirely new lines of research on the nature of characters!
Seiichi Uchida Dr. Seiichi Uchida is a Distinguished Professor and Executive Vice-President at Kyushu University, Japan. He has been a prominent figure in document image analysis and recognition, actively contributing to the ICDAR community and other IAPR-affiliated conferences for many years. His roles and achievements include:
● 2007 IAPR/ICDAR Best Paper Award
● 2012 DAS (Document Analysis Systems) Program Co-Chair
● 2015 ICDAR Workshop Chair
● 2017 ICDAR Executive Co-Chair
● 2019 ICDAR Workshop Chair
● 2021 ICDAR Program Co-Chair
● 2022 DAS Program Co-Chair
● 2023 ICDAR Keynote Speaker
Dr. Uchida’s research interests encompass document image analysis, handwriting recognition, pattern recognition, and related areas. He has published extensively in top-tier conferences and journals, supervised numerous graduate students, and served in various organizational and committee roles at international events.
Webpage: https://scholar.google.co.jp/citations?user=QMpdhysAAAAJ
Tutorial 2: How to train a Multi-modal Large Document Understanding Model?
Organizers:
Wei Shen (Shanghai Jiao Tong University), wei.shen@sjtu.edu.cn
Yu Zhou (Nankai University), yzhou@nankai.edu.cn
This tutorial will provide participants with a comprehensive understanding of multi-modal document understanding models, focusing on the integration of textual and visual data for improved document analysis.
Attendees will learn about model architecture, training techniques, and real-world applications. The tutorial is designed for researchers and practitioners interested in the latest advancements in document analysis, making it a perfect addition to the ICDAR conference program.
Wei Shen
Dr. Wei Shen is a professor at the Artificial Intelligence Institute, Shanghai Jiao Tong University. He was an Assistant Research Professor at the Department of Computer Science, Johns Hopkins University. He received his B.S. and Ph.D. degrees from Huazhong University of Science and Technology in 2007 and in 2012, respectively. He has over 80 peer-reviewed publications in computer vision and machine learning related areas, including IEEE Trans. PAMI, IEEE Trans. Image Processing, IEEE Trans. Medical Imaging, NeurIPS 2023/2024/2025, CVPR 2022/2023, ICML2025, and ICCV2025. He is an Associate Editor for Pattern Recognition. He receives the MICCAI Young Scientist Award and the NSFC Excellent Young Scientists Fund in 2023.
Web Page: https://shenwei1231.github.io/
Yu Zhou
Dr. Zhou holds the BSc, MSc and PhD degrees in computer science from Harbin Institute of Technology. As a professor and a PhD supervisor in college of computer science, Nankai University, his research interests include computer vision and deep learning, with a special interest in visual text processing, detection, recognition and understanding. He served as AC, SPC, and PC members of CVPR, ICCV, ECCV, NeurIPS, ICDAR, and etc, and reviewers of TPAMI, TIP, and etc. He has published over 80 papers in peer-reviewed journals and conferences including CVPR, ICCV, NeurIPS, TMM, TNNLS, and etc, and the paper PIMNet has been selected as the best paper candidate in ACM MM 2021.
Web Page: https://intimelab.github.io/
Tutorial 3: General Introduction to Oracle Bone Scripts Processing
Organizers:
Qiufeng Wang (Xi’an Jiaotong-Liverpool University), Qiufeng.Wang@xjtlu.edu.cn
Yuliang Liu (Huazhong University of Science and Technology), ylliu@hust.edu.cn
Bang Li (Anyang Normal University), libang@aynu.edu.cn
Oracle Bone Script (OBS) is the earliest known form of Chinese writing, dating back over 3,000 years to the Shang Dynasty. It serves as a crucial resource for historical linguistics, epigraphy, and the study of ancient Chinese civilization. However, due to the complexity of OBS, including its vast character variations, irregular inscriptions, and fragmented artifacts, the automatic recognition, segmentation, and analysis of OBS remain highly challenging tasks in document analysis and recognition.
This tutorial aims to introduce participants to the latest advances in Oracle Bone Script processing, covering key techniques such as character recognition, inscription restoration, dataset construction, and deep learning-based approaches. We will also discuss the interdisciplinary collaboration between computer vision, natural language processing, and archaeology, which is essential for advancing research in this domain.
The motivation behind this tutorial is twofold: First, despite significant progress in optical character recognition (OCR) for modern scripts, ancient scripts like OBS require specialized methods due to their non-standard writing systems. Second, there is growing interest in leveraging artificial intelligence (AI) and deep learning for historical document analysis, making this an opportune moment to bring together researchers from diverse backgrounds to explore innovative solutions.
By providing a comprehensive overview of OBS processing, this tutorial will foster collaboration between researchers in document analysis, computational humanities, and epigraphy, ultimately contributing to the preservation and understanding of one of the world's most ancient writing systems.
Qiufeng Wang
Professor Qiufeng Wang is the head of Department of Intelligent Science at School of Advanced Technology in Xi’an Jiaotong-Liverpool University (XJTLU), and the Director of Suzhou Municipal Key Lab of Cognitive Computing and Applied Technology. He received the Ph.D degree in Pattern Recognition and Intelligence Systems from Institute of Automation, Chinese Academy of Sciences (CASIA), and won Presidential Scholarship of Chinese Academy of Sciences. After that, he worked at the National Laboratory of Pattern Recognition (NLPR) in CASIA, and then Microsoft. Dr. Wang joined XJTLU in Feb. 2017. His research interests include pattern recognition and machine learning, especially document analysis and recognition. Dr. Wang has published around 100 papers, including IEEE T-PAMI, Patten Recognition, ICCV, ICML, and published one book about deep learning in Springer.
Email: Qiufeng.Wang@xjtlu.edu.cn
Webpage: https://scholar.xjtlu.edu.cn/en/persons/QiufengWang
Yuliang Liu
Dr. Yuliang Liu received the Ph.D degree from SCUT in 2020. He is currently a professor with the Department of Artificial Intelligence and Automation, Huazhong University of Science and Technology, guest editor of Science China. He has authored or coauthored more than 50 papers in top conferences and journals, including the CVPR, ACL, NeurIPS, and the IEEE TPAMI/IJCV. His research interests include computer vision, document intelligence, and large-scale data analysis. In recent years, he primarily focuses on vision-language large multimodal models, pushing forward the state of the art in these areas. He was the recipient of the Best Paper Award in ACL 2024. He was also the recipient of multiple championship awards in CVPR-TextVQA, Tiny-ImageNet, ICDAR-MLT, and ICDAR-ReCTS.
Email: ylliu@hust.edu.cn
Webpage: https://github.com/Yuliang-Liu
Bang Li
Dr. Bang Li, Associate Professor, the Director of Big Data Analysis Research Laboratory at the Key Laboratory of Oracle Bone Script Information Processing of Ministry of Education, Anyang Normal University. He received the Ph.D. degree in Electronic Science and Technology from Beijing University of Posts and Telecommunications in 2019. His research interests cover machine learning, oracle bone script information processing, especially the construction of datasets and oracle bone script - driven machine learning applications. He has participated in the organization of several oracle bone script datasets, such as Oracle-AYNU, OBC306, HWOBC, OBIMD, etc.
Email: libang@aynu.edu.cn
WebPage: https://scholar.google.com/citations?hl=zh-CN&user=PtKF2vkAAAAJ