Annotating Entities with Fine-Grained Types in Austrian Court Decisions

Research & Innovation

The usage of Named Entity Recognition tools on domain specific corpora is often hampered by insufficient training data. We investigate an approach to produce fine-grained named entity annotations from a small training dataset to later apply it on a large corpus of Austrian court decisions. We apply general purpose Named Entity Recognition model to produce annotations of common coarse-grained types. Next, a small sample of these annotations are manually annotated by domain experts to produce initial fine-grained training datset. To efficiently use the small manually annotated dataset we formulate the task of named entity typing as a binary classification task -- for each originally annotated occurrence of an entity, and for each fine-grained type we verify if the entity belongs to it. For this purpose we train a transformer-based classifier. We randomly sample 547 predictions and evaluate them manually. The incorrect predictions are used to improve the performance of the classifier -- the corrected annotations are added to the training set. The experiments show that even a very small number (5 or 10) of originally incorrect predictions can significantly improve the classifier performance. We finally train the classifier on all available data and re-annotate the whole dataset. The resulting annotated dataset is available for training a new Named Entity Recognition model.


Access the Recording and Slide Deck?

As a registered participant, you got a login to access the recording and slide deck. You may also purchase an on-demand ticket (36,- incl. VAT).