SCIEPublish

Large-Scale Language Model Assisted Construction of Multi-Source Heterogeneous Knowledge Graphs for Marine Renewable Energy

Article Open Access

Large-Scale Language Model Assisted Construction of Multi-Source Heterogeneous Knowledge Graphs for Marine Renewable Energy

Author Information
1
School of Ocean Energy, Tianjin University of Technology, Tianjin 300384, China
2
Technical College for the Deaf, Tianjin University of Technology, Tianjin 300384, China
3
Hualan Design and Consulting Group Company Ltd., Nanning 530011, China
*
Authors to whom correspondence should be addressed.

Received: 10 December 2025 Revised: 17 December 2025 Accepted: 09 January 2026 Published: 14 January 2026

Creative Commons

© 2026 The authors. This is an open access article under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

Views:2
Downloads:1
Mar. Energy Res. 2026, 3(1), 10002; DOI: 10.70322/mer.2026.10002
ABSTRACT: Marine renewable energy systems, particularly offshore wind and photovoltaic (PV) installations, generate large volumes of heterogeneous maintenance texts. However, the resulting knowledge remains fragmented due to dispersed sources, diverse formats, and domain-specific terminology. To address these challenges, this study proposes a large-scale language model assisted methodology for constructing a multi-source heterogeneous knowledge graph for intelligent operation and maintenance (O&M). The method integrates unified document preprocessing, domain-oriented prompt engineering, large-scale language model–based entity and relation extraction, and multi-level entity normalization. It systematically transforms unstructured documents (e.g., standards, procedures, manuals, inspection records, and environmental reports) into structured triples, enabling the construction of a dynamically evolving O&M knowledge graph. A rigorous ablation study on real-world offshore wind and PV datasets demonstrates that the proposed workflow exhibits exceptional robustness against OCR noise (e.g., scanned artifacts, stamps, and signatures) and substantially improves extraction volume, accuracy, and coverage compared with traditional methods. In particular, combining high-quality preprocessing and optimized prompts yields the most reliable and semantically coherent results. The study provides a practical technical pathway for automated knowledge management in marine renewable energy and offers a foundation for future applications in intelligent diagnostics, predictive maintenance, and digital-twin systems.
Keywords: Knowledge graph construction; Operation and maintenance; Large-scale language models; Marine renewable energy
TOP