master’s thesis, imperial college london, 2023
l. g. stigliano, i. rekik
graph neural networks (gnns) are powerful models for graph-structured data but face scalability challenges that hinder deployment in real-time and resource-constrained settings, motivating the use of knowledge distillation on graphs (kdg) to compress models while preserving performance; however, existing approaches largely overlook reproducibility, a key requirement for trustworthy model interpretation.
in this work, we introduce the concept of reproducible offline knowledge distillation for gnns and show that standard kd and kdg methods often degrade reproducibility. to address this, we propose reproducibility-aware knowledge distillation on graphs (repkd), a two-stage framework that jointly trains multiple student models in a one-to-many teacher–student setup and selects the most reproducible student. across multiple datasets and gnn architectures, repkd improves self-reproducibility while maintaining predictive performance, achieving over 95% parameter reduction with negligible memory overhead and comparable training times. we further explore the interpretability of the resulting distilled models.