Data: wtorek, 29.06.2021, godz. 11:00-12:00
Prelegent: Dawid Jurkiewicz (WMiI UAM)
Abstrakt: Understanding documents with rich-layouts plays a vital role in digitization and hyper-automation but remains a challenging topic in the NLP research community. Additionally, the lack of a commonly accepted benchmark made it difficult to quantify progress in the domain. To empower research in Document Understanding, we present a suite of tasks that fulfill the highest quality, difficulty, and licensing criteria. The benchmark includes Visual Question Answering, Key Information Extraction, and Machine Reading Comprehension tasks over various document domains, and layouts featuring tables, graphs, lists, and infographics. The current study reports systematic baselines making use of recent advances in layout-aware language modeling. To support adoption by other researchers, both the benchmarks and reference implementations will be shortly released.