ICDAR2025

The 19th International Conference on

Document Analysis and

Recognition

September 16-21, 2025 Wuhan, Hubei, China

Tutorial 1: Beyond Recognition: A Multidimensional Exploration of Characters

Date and time: September 16, 2025, 8:30-11:00

Speaker: Seiichi Uchida (Kyushu University, Japan), uchida@ait.kyushu-u.ac.jp

In the field of document image recognition and understanding, characters are an indispensable element. Traditionally, much research has focused primarily on recognizing characters — treating them simply as objects of recognition — and on extracting semantic information from text when integrating with natural language processing. However, we seldom pause to ponder a fundamental question: What exactly is a character?

This tutorial aims to investigate that question from multiple perspectives, shedding light on how character images differ from generic images, how fonts are used and why so many different fonts are necessary. We will also explore why we are able to identify certain shapes as specific characters (e.g., recognizing an “A” as the letter A) and examine the unique properties that distinguish handwritten text from printed text. Moreover, we will discuss how characters appear in various contexts — from printed documents to building surfaces and inscriptions — and the cultural and historical dimensions surrounding them.

By exploring these facets, participants will gain fresh insights into the fundamental nature of textual information. This broader perspective may lead to innovative approaches in document image analysis and open new avenues for research and practical applications within the ICDAR community. Rather than focusing on incremental improvements to existing tasks, I hope this tutorial will inspire entirely new lines of research on the nature of characters!

Seiichi Uchida Dr. Seiichi Uchida is a Distinguished Professor and Executive Vice-President at Kyushu University, Japan. He has been a prominent figure in document image analysis and recognition, actively contributing to the ICDAR community and other IAPR-affiliated conferences for many years. His roles and achievements include:

● 2007 IAPR/ICDAR Best Paper Award

● 2012 DAS (Document Analysis Systems) Program Co-Chair

● 2015 ICDAR Workshop Chair

● 2017 ICDAR Executive Co-Chair

● 2019 ICDAR Workshop Chair

● 2021 ICDAR Program Co-Chair

● 2022 DAS Program Co-Chair

● 2023 ICDAR Keynote Speaker

Dr. Uchida’s research interests encompass document image analysis, handwriting recognition, pattern recognition, and related areas. He has published extensively in top-tier conferences and journals, supervised numerous graduate students, and served in various organizational and committee roles at international events.

Webpage: https://scholar.google.co.jp/citations?user=QMpdhysAAAAJ

Tutorial 2: How to train a Multi-modal Large Document Understanding Model?

Date and time: September 16, 2025, 13:30-16:00

Speakers: Wei Shen (Shanghai Jiao Tong University), wei.shen@sjtu.edu.cn

Yu Zhou (Nankai University), yzhou@nankai.edu.cn

This tutorial will provide participants with a comprehensive understanding of multi-modal document understanding models, focusing on the integration of textual and visual data for improved document analysis.

Attendees will learn about model architecture, training techniques, and real-world applications. The tutorial is designed for researchers and practitioners interested in the latest advancements in document analysis, making it a perfect addition to the ICDAR conference program.

Wei Shen

Dr. Wei Shen is a professor at the Artificial Intelligence Institute, Shanghai Jiao Tong University. He was an Assistant Research Professor at the Department of Computer Science, Johns Hopkins University. He received his B.S. and Ph.D. degrees from Huazhong University of Science and Technology in 2007 and in 2012, respectively. He has over 80 peer-reviewed publications in computer vision and machine learning related areas, including IEEE Trans. PAMI, IEEE Trans. Image Processing, IEEE Trans. Medical Imaging, NeurIPS 2023/2024/2025, CVPR 2022/2023, ICML2025, and ICCV2025. He is an Associate Editor for Pattern Recognition. He receives the MICCAI Young Scientist Award and the NSFC Excellent Young Scientists Fund in 2023.

Web Page: https://shenwei1231.github.io/

Yu Zhou

Dr. Zhou holds the BSc, MSc and PhD degrees in computer science from Harbin Institute of Technology. As a professor and a PhD supervisor in college of computer science, Nankai University, his research interests include computer vision and deep learning, with a special interest in visual text processing, detection, recognition and understanding. He served as AC, SPC, and PC members of CVPR, ICCV, ECCV, NeurIPS, ICDAR, and etc, and reviewers of TPAMI, TIP, and etc. He has published over 80 papers in peer-reviewed journals and conferences including CVPR, ICCV, NeurIPS, TMM, TNNLS, and etc, and the paper PIMNet has been selected as the best paper candidate in ACM MM 2021.

Web Page: https://intimelab.github.io/

Tutorial 3: General Introduction to Oracle Bone Scripts Processing

Date and time: September 16, 2025, 8:30-11:00

Speakers: Qiufeng Wang (Xi’an Jiaotong-Liverpool University), Qiufeng.Wang@xjtlu.edu.cn

Yuliang Liu (Huazhong University of Science and Technology), ylliu@hust.edu.cn

Bang Li (Anyang Normal University), libang@aynu.edu.cn

Oracle Bone Script (OBS) is the earliest known form of Chinese writing, dating back over 3,000 years to the Shang Dynasty. It serves as a crucial resource for historical linguistics, epigraphy, and the study of ancient Chinese civilization. However, due to the complexity of OBS, including its vast character variations, irregular inscriptions, and fragmented artifacts, the automatic recognition, segmentation, and analysis of OBS remain highly challenging tasks in document analysis and recognition.

This tutorial aims to introduce participants to the latest advances in Oracle Bone Script processing, covering key techniques such as character recognition, inscription restoration, dataset construction, and deep learning-based approaches. We will also discuss the interdisciplinary collaboration between computer vision, natural language processing, and archaeology, which is essential for advancing research in this domain.

The motivation behind this tutorial is twofold: First, despite significant progress in optical character recognition (OCR) for modern scripts, ancient scripts like OBS require specialized methods due to their non-standard writing systems. Second, there is growing interest in leveraging artificial intelligence (AI) and deep learning for historical document analysis, making this an opportune moment to bring together researchers from diverse backgrounds to explore innovative solutions.

By providing a comprehensive overview of OBS processing, this tutorial will foster collaboration between researchers in document analysis, computational humanities, and epigraphy, ultimately contributing to the preservation and understanding of one of the world's most ancient writing systems.

Qiufeng Wang

Professor Qiufeng Wang is the head of Department of Intelligent Science at School of Advanced Technology in Xi’an Jiaotong-Liverpool University (XJTLU), and the Director of Suzhou Municipal Key Lab of Cognitive Computing and Applied Technology. He received the Ph.D degree in Pattern Recognition and Intelligence Systems from Institute of Automation, Chinese Academy of Sciences (CASIA), and won Presidential Scholarship of Chinese Academy of Sciences. After that, he worked at the National Laboratory of Pattern Recognition (NLPR) in CASIA, and then Microsoft. Dr. Wang joined XJTLU in Feb. 2017. His research interests include pattern recognition and machine learning, especially document analysis and recognition. Dr. Wang has published around 100 papers, including IEEE T-PAMI, Patten Recognition, ICCV, ICML, and published one book about deep learning in Springer.

Email: Qiufeng.Wang@xjtlu.edu.cn

Webpage: https://scholar.xjtlu.edu.cn/en/persons/QiufengWang

Yuliang Liu

Dr. Yuliang Liu received the Ph.D degree from SCUT in 2020. He is currently a professor with the Department of Artificial Intelligence and Automation, Huazhong University of Science and Technology, guest editor of Science China. He has authored or coauthored more than 50 papers in top conferences and journals, including the CVPR, ACL, NeurIPS, and the IEEE TPAMI/IJCV. His research interests include computer vision, document intelligence, and large-scale data analysis. In recent years, he primarily focuses on vision-language large multimodal models, pushing forward the state of the art in these areas. He was the recipient of the Best Paper Award in ACL 2024. He was also the recipient of multiple championship awards in CVPR-TextVQA, Tiny-ImageNet, ICDAR-MLT, and ICDAR-ReCTS.

Email: ylliu@hust.edu.cn

Webpage: https://github.com/Yuliang-Liu

Bang Li

Dr. Bang Li, Associate Professor, the Director of Big Data Analysis Research Laboratory at the Key Laboratory of Oracle Bone Script Information Processing of Ministry of Education, Anyang Normal University. He received the Ph.D. degree in Electronic Science and Technology from Beijing University of Posts and Telecommunications in 2019. His research interests cover machine learning, oracle bone script information processing, especially the construction of datasets and oracle bone script - driven machine learning applications. He has participated in the organization of several oracle bone script datasets, such as Oracle-AYNU, OBC306, HWOBC, OBIMD, etc.

Email: libang@aynu.edu.cn

WebPage: https://scholar.google.com/citations?hl=zh-CN&user=PtKF2vkAAAAJ

Tutorial 4: Historical Documents in Focus: Visual and Computational Analysis from Papyri to Inscriptions

Date and time: September 16, 2025, 13:30-15:30

Speakers: Isabelle Marthot-Santaniello (University of Basel), i.marthot-santaniello@unibas.ch Giuseppe De Gregorio (University of Basel), giuseppe.degregorio@unibas.ch

This tutorial offers a guided introduction to the computational analysis of historical documents, with a focus on methods from computer vision and pattern recognition. Participants will explore how ancient manuscripts, inscriptions, and other written artifacts, often degraded, fragmented, and highly variable, pose unique challenges for document analysis. We will discuss key tasks such as layout segmentation, handwriting recognition, and document reconstruction, while reflecting on the methodological and ethical complexities involved.

Designed together by a papyrologist/ancient historian/classicist (Isabelle) and a computer vision specialist (Giuseppe), the session is rooted in active interdisciplinary research, particularly the study of ancient Greek papyri from Egypt using digital and AI methods.

Drawing on real-world case studies, especially from ancient Greek papyri, the tutorial bridges the gap between humanities research and technical innovation. It is designed for an interdisciplinary audience and welcomes anyone interested in cultural heritage, historical data, or applying AI to complex real-world materials.

Isabelle Marthot-Santaniello

Prof. Dr. Isabelle Marthot-Santaniello is a specialist in Greek papyrology and digital palaeography. She earned her Master’s degree in Ancient History and Classics from the École Pratique des Hautes Études in Paris, where she also completed her PhD in Greek Papyrology. Following a postdoctoral fellowship at the University of Minnesota working on a crowd-sourcing project funded by the U.S. National Endowment for the Humanities, she joined the University of Basel in Switzerland. From 2018 to 2023, she was the Principal Investigator of the SNSF Ambizione project "D-Scribes", focusing on the digital paleography of Greek and Coptic papyri. She is currently leading the SNSF Starting Grant project "EGRAPSA" (2023–2028), which investigates the evolution of handwriting in Graeco-Roman Egypt through computational methods. Her research combines traditional philological expertise with state-of-the-art computer vision, and she has published extensively on writer identification and papyrus image enhancement, including the release of specialized software tools. Dr. Marthot-Santaniello has co-organized several local and international scientific events dedicated to computational paleography, including workshops at ICDAR 2023 and 2024. She also contributed to the organization of the ICDAR 2019 DIBCO Competition and the ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri, reinforcing her longstanding engagement with the document analysis and recognition community.

Email: i.marthot-santaniello@unibas.ch

WebPage: https://daw.philhist.unibas.ch/fr/persons/isabelle-marthot-santaniello/

Giuseppe De Gregorio

Dr. Giuseppe De Gregorio is a computer vision and pattern recognition researcher specializing in handwriting recognition and historical document analysis. He received his Master’s degree in Computer Engineering and earned his PhD in Information Engineering from the University of Salerno (Italy) in 2023. His doctoral research focused on "N-gram Retrieval for Word Spotting in Historical Handwritten Collections", contributing novel techniques to improve the indexing and retrieval of words in manuscript archives. Since 2023, Dr. De Gregorio has been a postdoctoral researcher in the SNSF Starting Grant project "EGRAPSA" at the University of Basel, where he applies deep learning and computer vision methods to the analysis of ancient Greek papyri. His work explores the detection and recognition of characters, palaeographic dating, and layout segmentation using both supervised and self-supervised approaches. Dr. De Gregorio is an active contributor to the ICDAR community and the broader document analysis field. He has published in leading conferences and journals. His research emphasizes the challenges posed by real-world historical data, including preservation variability, lack of ground truth, and the need for multimodal and interdisciplinary solutions.

Email: giuseppe.degregorio@unibas.ch

WebPage: https://daw.philhist.unibas.ch/fr/persons/de-gregorio-giuseppe/

Google Sites

Report abuse