AI-driven Experimental Design and Structure-Property Inference: A Lignin Case Study

Text

AI-driven Experimental Design and Structure-Property Inference: A Lignin Case Study

Matthias Stosiek1,*, Patrick Rinke1

1Department of Physics, Atomistic Modelling Center, Munich Data Science Institute, Technical University Munich

*Corresponding author email address: matthiasstosiek [at] web [dot] de (matthias[dot]stosiek[at]tum[dot]de)

The application of artificial intelligence (AI) to materials science is often limited by the scarcity and high cost of experimental data. In this talk, we present a closed-loop approach that combines AI-guided experimentation with data-driven structure–property analysis to accelerate discovery in complex materials systems. As a case study, we investigate lignin–carbohydrate complexes (LCCs), following the full pipeline from sample generation in the Aqua Solv Omni (AqSO) biorefinery process to chemical insight derived from the resulting dataset. LCCs are abundant, chemically diverse, and underutilized biopolymers, whose targeted use is hindered by the poorly understood relationship between structure and properties.

In the first part, we show how Bayesian optimization (BO) can guide experimental campaigns. By iteratively selecting process parameters—such as reactor temperature (T), reaction severity (P-factor), and liquid-to-solid (L/S) ratio—BO efficiently explores the design space with minimal samples. This enables targeted optimization of objectives such as biorefinery yield and carbohydrate content, while reducing experimental effort compared to conventional design-of-experiments approaches. At the same time, the adaptive loop generates a structured, information-rich dataset, helping to mitigate data scarcity in experimental materials science. The BO campaign and resulting dataset have been reported previously [1,2].

In the second part, we use this dataset to uncover structure–property relationships in LCCs. The dataset comprises 95 samples, combining 2D nuclear magnetic resonance (NMR) spectra with measurements of key physicochemical properties, including antioxidant activity, glass transition temperature, molecular weight, surface tension, and degradation metrics. We construct chemistry-agnostic, interpretable structural descriptors and employ random forest regression (RFR) together with feature importance analysis to link structure to function. This approach enables chemical interpretation; for example, we find that a higher abundance of β-O-4 linkages correlates with lower surface tension, consistent with a more linear lignin structure.

Overall, our approach demonstrates how AI can bridge data acquisition and scientific insight: Bayesian optimization accelerates and structures data generation, while machine learning models translate these data into interpretable structure–property relationships. The workflow is broadly applicable to complex materials systems where experiments are costly and mechanisms remain poorly understood.

[1] Diment, D., Löfgren, J., Stosiek, M., et al., ChemSusChem, p. e202401711.
[2] Alopaeus, M., Stosiek, M., et al., Sci Data 12, 996 (2025).