Research results
Working papers
Quantifying Page Segmentation Quality in Historical Job Advertisements Retrieval by Klára Venglařová, Raven Adam, Saranya Balasubramanian and Georg Vogeler
This paper addresses the challenge of evaluating page segmentation methods in the context of extracting historical job advertisements in digitized newspapers. Accurate segmentation is essential for high-quality Optical Character Recognition (OCR) results, yet the methodology for comparing and evaluating segmentation algorithms has received limited attention in Digital Humanities. The paper presents an evaluation framework developed within the JobAds Project, focusing on textual congruence between predicted and ground-truth regions. This is important for an evidence-based segmentation algorithm selection and offers insights into segmented data quality, impacting research outcomes. The paper examines three evaluation features: intersection area, text similarity based on Levenshtein distance, and text presence/absence in non-intersecting parts of the predicted region and its ground truth, revealing their effectiveness through logistic regression models. The method involves manual ground-truth creation, aiming for an automatic metric to quantify textual congruence. Results show that combining the text presence/absence feature with Hausdorff distance achieves the highest performance, reaching an F1 score of 0.957 on the testing subset. The study emphasizes the need for tailored evaluation metrics in Digital Humanities and highlights challenges posed by OCR errors and irregular layouts while underscoring the importance of transparency in research. The proposed evaluation framework offers insights for segmentation assessment in historical newspapers, with further application beyond the specific dataset and use case.
Emergence of the Austrian labor market by Jörn Kleinert and Wiltrud Mölzer
In the last 200 years, the division of labor has increased drastically. The different skills and knowledge need to be combined for production. How is the dispersed knowledge brought to the place where it creates a particularly large value? To assess this matching, we study labor markets as the devise to facilitate such processes in a decentralized manner. We start with our investigation in the middle of the 19th century, which was the beginning of the `modern' labor market and follow the market for 100 years. We use job ads in newspapers as our major data source. The analysis is put into perspectives of emergence, development and functioning of markets as means to facilitate the matching. The labor market was `created' by initiative of many actors including some public actors at later time. The market changed through time without losing robustness and functionality. The changes made increased the stability of matches and follows social preferences.