SCIEPublish

Multi-Agent Reinforcement Learning for Optimal Operation of PV-ES-EV Microgrids

Article Open Access

Multi-Agent Reinforcement Learning for Optimal Operation of PV-ES-EV Microgrids

Author Information
1
College of Electrical Engineering, Sichuan University, Chengdu 610065, China
2
State Grid Sichuan Electric Power DC Company, Chengdu 610042, China
3
Anhui USEM Technology Co., Ltd., Wuhu 241000, China
4
School of Electrical Engineering, Anhui Polytechnic University, Wuhu 241000, China
5
State Grid Anhui Electric Power Corporation Research Institute, Hefei 340100, China
6
Qingdao Shanghe TELD Virtual Power Plant Technology Co., Ltd., Qingdao 266000, China
*
Authors to whom correspondence should be addressed.

Received: 01 April 2026 Revised: 15 April 2026 Accepted: 28 April 2026 Published: 13 May 2026

Creative Commons

© 2026 The authors. This is an open access article under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

Views:38
Downloads:16
Smart Energy Syst. Res. 2026, 2(2), 10008; DOI: 10.70322/sesr.2026.10008
ABSTRACT: Aiming at the difficulty in balancing economic efficiency and islanding autonomy security during grid-connected operation of microgrids, as well as the limitation of fixed weights in traditional multi-objective optimization, this paper proposes a grid-connected interactive optimization strategy considering dynamic autonomy weights. A microgrid autonomy index is defined to quantify islanding preparedness, and a lightweight prediction network is designed to generate online weights for the three objectives of economy, security, and autonomy, so as to realize adaptive adjustment of the optimization focus. Furthermore, the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm is adopted to coordinate photovoltaics, energy storage, electric vehicle chargers, various loads, as well as power purchasing and selling, enabling decentralized decision-making. Results show that the proposed strategy achieves economic performance close to that of economic-only optimization (i.e., disregarding islanding preparedness) under grid-connected conditions without external faults, while shortening the interruption duration of critical loads by more than 72% during islanding transition caused by external grid faults. Meanwhile, the state of charge (SOC) remains strictly within the operational safety band of 20–90% throughout all dispatch cycles, complying with industry norms for battery cycle life preservation. The dynamic weights for economy, security, and autonomy are generated online by a lightweight neural network based solely on real-time system states rather than being fixed a priori, verifying the effectiveness of the proposed mechanism in achieving a context-aware trade-off among conflicting objectives.
Keywords: Microgrid; Dynamic autonomy weight; Multi-agent reinforcement learning; Clipped proximal policy optimization algorithm; PV-ES-EV
TOP