FKFabian Karl
Data Scientist · ML Engineer · PhD Researcher

Machine learning
for messy human text.

I build machine learning systems for medicine, language, and the spaces between research and practice.

Download CVGet in touch
Fabian Karl · Munich
Munich, Germany
TU München · sebis
2025 → present
Publications↑ live from DBLP
9Years codingsince 2017
1.2M.Sc. gradeUlm '24
1.3B.Sc. gradeUlm '22
01 — About

About me.

I am a Data Scientist and PhD candidate at the Technical University of Munich, where I research Medical NLP at the Chair of Software Engineering for Business Information Systems (sebis).

My work sits at the intersection of information retrieval, synthetic data generation, and model evaluation, building systems that read, reason, and recommend across messy real-world text.

Nine years of programming, several first-author papers, one master's degree (1.2), and a stubborn belief that good engineering is what makes research useful.

Quick facts

Based inMunich
AffiliationTUM · sebis
FocusMedical NLP
LanguagesPython · LaTeX · Java

"Practical applications drive the necessity for better algorithms.

Current focus

Now.

  • Building01 / 03

    Synthetic data pipelines

    Generating, filtering, and grounding training data for clinical and scientific NLP tasks.

    #Synthetic Data · Clinical
  • Researching02 / 03

    Domain adaptation of LLMs

    Adapting large language models to the medical domain through targeted fine-tuning and evaluation.

    #Medical NLP · LLMs
  • Exploring03 / 03

    Information retrieval and RAG

    Building and benchmarking retrieval-augmented generation pipelines for knowledge-intensive tasks.

    #IR · RAG
02 — Academic journey

Academic timeline.

2025present

PhD Research, Medical NLP

Technical University of Munich · sebis

Continuing doctoral research with a focus on information retrieval, synthetic data, and evaluation of medical language models.

2024

PhD Research, NLP

Ulm University

Started doctoral research on advanced natural-language processing techniques for scientific and bibliographic text.

2024

M.Sc. Computer Science (1.2)

Ulm University

Thesis: "Retrieval Augmented Information Extraction: Enhancing Language Models with CRAWLDoc."

2022

Student Research Assistant

Data Science group

Co-authored work on transformer-based short-text classification and other NLP topics.

2022

B.Sc. Computer Science (1.3)

Ulm University

Thesis: "Transformers are Short Text Classifiers."

03 — Peer-reviewed & preprints

Selected publications.

Synchronized live with DBLP, supplemented with manually curated entries.

Fetching publications from DBLP…
04 — Selected work

Selected projects.

CRAWLDocRetrieval · LLM
Retrieval · LLM

CRAWLDoc

A robust ranking dataset for bibliographic web documents. Built to stress-test retrieval-augmented extraction across heterogeneous sources.

German Party ManifestosNLP · Topic Modeling
NLP · Topic Modeling

German Party Manifestos

Semi-automatic analysis of major German party manifestos using LDA, HDP, and BERT. Interactive demo ranks summaries and scores positions.

Efficient InferencingSmall LMs
Small LMs

Efficient Inferencing

Distillation, pruning, and quantization for academic-writing feedback. Benchmarked on server CPUs, laptops, and SoC devices.

Graph-MLP SamplingGraph ML
Graph ML

Graph-MLP Sampling

Empirical study of thirteen sampling strategies for Graph-MLP across six benchmarks. Sampling is a hyperparameter, not a default.

05 — Toolbox

The toolbox.

Research

Medical NLPInformation RetrievalSynthetic DataModel EvaluationTopic Modeling

Models

TransformersLLM DistillationQuantizationGraph Neural NetsRetrieval-Augmented Generation

Stack

PythonJavaLaTeXHugging FaceFastAPIDockerGit