Author Information

College of Artificial Intelligence, Nankai University, Tianjin 300350, China

*

Authors to whom correspondence should be addressed.

Received: 04 September 2024 Accepted: 14 October 2024 Published: 18 October 2024

© 2024 The authors. This is an open access article under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

ABSTRACT:
A
new combat strategy that enables coordinated operations of gliding aircraft
clusters for multi-target strikes imposes higher demands on the coordination,
real-time responsiveness, and strike accuracy of gliding aircraft clusters. Due
to the high speed and large inertia characteristics of gliding aircraft,
traditional trajectory planning methods often face challenges such as long
computation times and difficulty in responding to dynamic environments in
real-time when dealing with large-scale gliding aircraft clusters. This paper
proposes a distributed cooperative trajectory planning method for multi-target
strikes by gliding aircraft clusters to address this
issue. By introducing a multi-objective distributed real-time trajectory
planning approach based on Multi-Agent Deep Deterministic Policy Gradients (MADDPG), the gliding aircraft execute
distributed cooperative trajectory planning based on the trained model. Due to
its robust real-time performance, the gliding aircraft do not need to
recalculate trajectories for different initial positions of the cluster.
Simulation results show that the average error between the gliding aircraft
cluster and the target point is 2.1 km, with a minimum error of 0.06 km and a
hit rate of 96.6%, verifying the significant advantages of this method in real-time
planning capability and strike accuracy.

Keywords:
Gliding aircraft
cluster; Trajectory planning; Multi agent-deep deterministic policy gradient;
Distributed collaboration

Gliding aircraft are launched by rocket boosters or lifted to a certain altitude or operational area by other carriers. Then, they utilize aerodynamic principles to begin unpowered gliding outside the Earth’s atmosphere, ultimately engaging in target strikes or landing. Gliding aircraft have advantages in speed and maneuverability and possess the capability to execute long-range missions and penetrate air defense systems. Early gliding aircraft generally performed single-mission operations. However, as military technology has advanced, the traditional ‘one-on-one’ combat mode for gliding aircraft has become increasingly difficult to handle more challenging missions [1]. The new combat mode adopts gliding aircraft clusters, engaging in point-to-point coordinated operations. Compared to the traditional ‘one-on-one’ combat mode, this method offers multiple advantages: rapid dynamic clustering based on real-time battlefield conditions, executing different tasks for different targets, providing diversity in combat strategies; the gliding aircraft can strike targets simultaneously, greatly improving strike efficiency and battlefield coverage; due to the large number and diverse routes of the cluster, it increases the difficulty of the enemy’s defense; the deterrence power of cluster attacks far exceeds that of traditional ‘one-on-one’ methods, creating greater strategic pressure on the enemy; even if some aircraft are intercepted or malfunction, other aircraft can continue the mission, ensuring the continuity and reliability of the operation. Based on these advantages, the real-time dynamic clustering and intra-cluster distributed space-time cooperative guidance architecture of gliding aircraft clusters can enhance penetration probability and achieve precise, synchronized strikes.
Gliding aircraft represent a category of vehicles, exemplified by missiles, spaceplanes, airdrop gliders, and re-entry vehicles. As shown in Figure 1, they differ from traditional powered aircraft in that they are unpowered and have high cruising speeds, making timely adjustments to their flight trajectories challenging. Therefore, real-time planning of feasible trajectories for clusters of gliding aircraft becomes an essential prerequisite for coordinated control and precise landing [2]. In the static task space, the multi-gliding aircraft cooperative problem is a typical trajectory planning problem under complex constraints [3]. In recent years, some scholars have conducted research on trajectory planning methods for clusters or formations. Reference [4] discusses the overall ‘resilience’ requirements of clusters and swarm-based agents such as UAVs. Reference [5] proposed a multi-aircraft collaborative trajectory planning method based on an improved Dubins-RVO method and symplectic pseudospectral method, which can generate feasible trajectories and achieve high-precision tracking in complex environments. Reference [6] addressed the path planning problem for multi-UAV formations in a known environment using an improved artificial potential field method combined with optimal control techniques. Reference [7] investigated the use of an improved grey wolf optimizer algorithm to solve the multi-UAV cooperative path planning problem in complex confrontation environments. It is evident that most of these studies focus on unmanned aerial vehicles, with few dedicated to trajectory planning for gliding aircraft clusters based on their specific characteristics. Moreover, traditional trajectory planning methods often entail long computation times and pre-set trajectories that struggle to adapt to dynamic environments, leading to delayed responses and low damage efficiency in gliding aircraft clusters. Due to current onboard computing capacity limitations, trajectory optimization should be considered in an offline state [8,9]. Deep Reinforcement Learning offers an innovative solution for multi-objective cooperative trajectory planning for gliding aircraft clusters, enhancing aircraft clusters. coordination and combat capabilities.
**Figure 1. **Gliding aircraft.
Traditional deep reinforcement learning methods are mostly applied to individual learning tasks, such as value function-based reinforcement learning methods [10,11] and policy search-based reinforcement learning methods [12,13,14,15,16]. There is already a considerable amount of research applying deep reinforcement learning to trajectory planning tasks for individuals. Reference [17] addressed the three-dimensional path planning problem for UAVs in complex environments using a deep reinforcement learning approach. Reference [18] optimized UAV trajectory and UAV-TU association using a double deep Q-network algorithm to enhance system performance and quality of service in mobile edge computing networks. Reference [19] studied Artificial Intelligence methods and key algorithms applied to UAV swarm navigation and trajectory planning. Reference [20] studied a method based on explainable deep neural networks to solve the problem of autonomous navigation for quadrotor UAVs in unknown environments. However, this study focuses on trajectory planning for a cluster of multiple gliding aircraft. Some scholars have already applied deep reinforcement learning methods for multi-agent collaboration to trajectory planning. Reference [21] proposed a STAPP method based on a multi-agent deep reinforcement learning algorithm to simultaneously solve the target assignment and path planning problems for multiple UAVs in dynamic environments. Reference [22] proposed a multi-layer path planning algorithm based on reinforcement learning, which improves UAV path planning performance in various environments by combining global and local information. Reference [23] used a multi-agent reinforcement learning approach to solve the problem of flexible data collection path planning for UAV teams in complex environments.
Accordingly, this paper proposes a multi-objective cooperative trajectory planning method for gliding aircraft clusters based on Multi-Agent Deep Deterministic Policy Gradients (MADDPG), utilizing the three degrees of freedom of gliding aircraft, and designing a reward function with initial constraints, terminal constraints, real-time path constraints, and collision avoidance. This method can plan feasible flight trajectories in real-time for each unit in the gliding aircraft cluster, achieving coordinated multi-aircraft strikes.

Illustrated in Figure 2 is the working principle of a gliding aircraft, divided into four main stages:
**Figure 2. **The working principle of the gliding aircraft.
In the various flight phases of a gliding aircraft cluster, trajectory planning during the glide phase is relatively more important. First, the glide phase typically involves long-distance flight, where each aircraft in the cluster must adjust its trajectory based on the overall mission objectives and tactical requirements to avoid collisions and reach the target. Second, since the aircraft no longer relies on a propulsion system during this phase, trajectory planning must take into account the effective management of energy. Finally, the trajectory planning in the glide phase directly affects the accuracy of the gliding aircraft’s distance to the target at the terminal moment, thereby creating conditions for success in the final strike phase (such as the dive phase).

- Boost Phase: In this stage, the aircraft relies on its propulsion system to accelerate and ascend. This phase typically involves the aircraft gaining initial speed and altitude, ensuring it can enter the subsequent gliding flight phase.
- Inertial Phase: The aircraft enters the inertial phase after the boost phase. During this stage, the propulsion system has ceased operation, and the aircraft continues to ascend or maintain speed through inertia.
- Glide Phase: The glide phase is the primary flight stage of the aircraft, where it utilizes its aerodynamic design to glide without power. This phase usually covers a long distance, and the aircraft can adjust its attitude to alter its trajectory, achieving greater maneuverability and stealth.
- Dive Phase: The dive phase begins with a rapid descent towards the target. This phase is typically aimed at increasing attack speed, enhancing the element of surprise, and reducing the likelihood of interception by enemy defense systems.

```latex\begin{cases}\dot{x}=\nu_{xy}\cos\varphi+\nu_{wx}\\\dot{y}=\nu_{xy}\sin\varphi+\nu_{wy}\\\,\,\,\,\,\,\,\,\,\,\,\,\,\,\dot{z}=\nu_z\\\,\,\,\,\,\,\,\,\,\,\,\,\,\,\ddot{\varphi}=u&\end{cases}```

```latex\begin{cases}x\Big(t_0\Big)=x_0\\y\Big(t_0\Big)=y_0\\z\Big(t_0\Big)=z_0\\\varphi\Big(t_0\Big)=\varphi_0\end{cases}```

```latex\begin{cases}t_f=\frac{z_0-z_f}{\nu_z}\\x\left(t_f\right)=x_f\\y\left(t_f\right)=y_f\\z\left(t_f\right)=z_f\end{cases}```

```latexD_t=\sqrt{\left(x_t-x_f\right)^2+\left(y_t-y_f\right)^2}```

```latexd=\mid\frac{z_t}{\nu_z}\times\nu_{xy}-D_t\mid ```

```latexr_d=\begin{cases}2,\text{ if }d<1\text{ and }d^{\prime}<1\\0.1\times\left(d^{\prime}-d\right),\text{ otherwise}\end{cases}```

```latexr_u=1-\mid a_t\mid-\mid a_t-a_t^{\prime}\mid ```

```latexr_t=K1\times r_d+K2\times r_u ```

```latexD_f=\sqrt{\left(x_{t_f}-x_f\right)^2+\left(y_{t_f}-y_f\right)^2}```

```latexr_f=\begin{cases}K-D_f,t=t_f,D_f\leq5\\M,t=t_f,D_f>5\end{cases}```

```latexQ\left(S,a|\begin{array}{c}\theta^Q\end{array}\right)```

```latexa_t=\mu\left(S_t|\begin{array}{c}\theta^\mu\end{array}\right)```

```latexa_t\sim\mathrm{clip}\Big(N\Big(\mu\Big(S_t| \theta^\mu\Big),\sigma^2\Big),a_{min},a_{max}\Big)```

```latex\begin{cases}\theta^{Q^{\prime}}\leftarrow\tau\theta^Q+(1-\tau)\theta^{Q^{\prime}}\\\theta^{\mu^{\prime}}\leftarrow\tau\theta^\mu+(1-\tau)\theta^{\mu^{\prime}}&\end{cases}```

```latexq_i=r_i+\gamma Q^{\prime}(S_{i+1},\mu^{\prime}(S_{i+1}|\theta^{\mu^{\prime}})|\theta^{Q^{\prime}})```

```latexL=\frac1n\sum_i^n\Bigl(y_i-Q\Bigl(S_i,a_i\Bigr| \theta^Q\Bigr)\Bigr)^2```

```latex\nabla_{{\theta^{\mu}}}J=\frac{1}{n}\sum_{i}^{n}\nabla_{a}Q{\left(S,a|\begin{array}{c}\theta^{Q}\end{array}\right)}|_{{S=S_{i},a=\mu(S_{i})}}\nabla_{{\theta^{\mu}}}\mu{\left(S|\begin{array}{c}\theta^{\mu}\end{array}\right)}|\mathcal{S}_{i}```

- During the training phase, each gliding aircraft can utilize the positions and observations of other aircraft.
- During the testing phase, each gliding aircraft plans its trajectory based solely on its observations.

```latex\nabla_{\theta_\mathrm{i}}J\left(\mu_i\right)=\mathrm{E}_{s,a\sim B}\left[\nabla_{\theta_\mathrm{i}}\mu_i\left(a_i|o_i\right)Q_i^\mu\left(x,a_1,\ldots,a_N\right)|_{a_i=\mu_i(o_i)}\right]```

```latexq_i=r_i+\gamma Q_i^{\prime}\left(\mathbf{x}^i,\mathbf{a}_1^{\prime},\cdots,\mathbf{a}_N^{\prime}\right)|_{a_j^{\prime}=\mu_j^{\prime}\left(o_j\right)}```

The task space size is 105 km × 75 km × 30 km. The initial altitude of the gliding aircraft cluster is set to 15 km. The horizontal and vertical speeds of the aircraft can be set based on the aircraft model and mission requirements. In this study, the horizontal speed is set to 2 Mach, while the vertical speed is a constant value, fixing the flight time. The aircraft are required to arrive near their mission targets at the designated time. A random wind field with a maximum speed of 30 m/s was added during testing. The trajectory planning model can use each aircraft’s positional information to plan trajectories in real-time, and this ability to adjust trajectories in real-time effectively reduces positional deviations caused by wind disturbances.
At the beginning of each round, the gliding aircraft cluster will randomly initialize its position in the task area. After 100 rounds of testing, where the gliding aircraft operated in a distributed cooperative manner, the average error between the gliding aircraft cluster and the target point was 2.1 km, with a minimum error of 0.06 km and a maximum error of 6.3 km. This study assumes that an error of more than 5 km is considered a miss, and the hit rate reaches 96.6%. Figure 6, Figure 7, and Figure 8 display different trajectory planning cases.
**Figure 6. **Simulation result I. (<b>a</b>) 3D trajectory; (<b>b</b>) 2D trajectory; (<b>c</b>) wind speed.
**Figure 7. **Simulation result II. (<b>a</b>) 3D trajectory; (<b>b</b>) 2D trajectory; (<b>c</b>) wind speed.
**Figure 8. **Simulation result III. (<b>a</b>) 3D trajectory; (<b>b</b>) 2D trajectory; (<b>c</b>) wind speed.
The above simulation results demonstrate that the multi-objective trajectory planning method for gliding aircraft clusters proposed in this paper successfully plans feasible trajectories for each of the 12 aircraft in the trajectory planning task, enabling them to reach their respective target points at the designated times concurrently.

Traditional trajectory planning methods often rely on complex computational models, which substantially increase computation time in large-scale scenarios and make adaptation to dynamic environmental changes challenging. This is especially challenging when dealing with gliding aircraft clusters, where obtaining effective planning results within a short time becomes difficult. This paper proposes a multi-objective trajectory planning method for gliding aircraft clusters based on the MADDPG algorithm to address this issue. Each gliding aircraft executes trajectory planning strategies in a distributed manner based on a pre-trained model, eliminating the need to recalculate trajectories for different initial positions. Additionally, a reward function tailored to multi-objective tasks is designed for each aircraft in the cluster, considering trajectory accuracy, energy minimization, and collision avoidance within the cluster. Simulation results show that the proposed trajectory planning method can plan optimal trajectories for gliding aircraft at different positions in real-time.
To reduce the training time of the trajectory planning model and ensure real-time decision efficiency, this paper simplifies the gliding aircraft model to a 3DOF model and pre-assigns a target to each aircraft. Future research could first consider using a higher degree of freedom model while maximizing decision efficiency. Additionally, it could incorporate real-time target allocation based on the positions of different aircraft within the flight mission to improve landing accuracy and mission success rate.

This work was supported by the National Natural Science Foundation of China (Grant No.62103204).

Conceptualization, Q.S.; Methodology, H.S.; Validation, J.Y.; Writing—Original Draft Preparation, J.Y.; Writing—Review & Editing, J.Y.

Not applicable.

Not applicable.

This research received no external funding.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

1.

Li Y, Wu Y, Qu X. Chicken Swarm–Based Method for Ascent Trajectory Optimization of Hypersonic Vehicles.* J. Aerosp. Eng. *** 2017**,* 30,* 04017043. [Google Scholar]

2.

Shen Z, Lu P. Onboard Generation of Three-Dimensional Constrained Entry Trajectories.* J. Guid. Control Dyn. *** 2003**,* 26,* 111–121. [Google Scholar]

3.

Xu X-P, Yan X-T, Yang W-Y, An K, Huang W, Wang Y. Algorithms and applications of intelligent swarm cooperative control: A comprehensive survey.* Prog. Aerosp. Sci. *** 2022**,* 135,* 100869. [Google Scholar]

4.

Abhishek P, Medrano FA. Examining application-specific resiliency implementations in UAV swarm scenarios.* Intell. Robot. *** 2023**,* 3,* 453–478. [Google Scholar]

5.

Liu J, Han W, Wang X, Li J. Research on Cooperative Trajectory Planning and Tracking Problem for Multiple Carrier Aircraft on the Deck.* IEEE Syst. J. *** 2020**,* 14,* 3027–3038. [Google Scholar]

6.

Chen Y, Yu J, Su X, Luo G. Path Planning for Multi-UAV Formation.* J. Intell. Robot. Syst.*** 2014**,* 77,* 229–246. [Google Scholar]

7.

Xu C, Xu M, Yin C. Optimized multi-UAV cooperative path planning under the complex confrontation environment. * Comput. Commun. *** 2020**,* 162,* 196–203. [Google Scholar]

8.

Zhao J, Zhou R, Jin X. Progress in reentry trajectory planning for hypersonic vehicle. * J. Syst. Eng. Electron. *** 2014**,* 25,* 627–639. [Google Scholar]

9.

Wei Z, Huang C, Ding D, Huang H, Zhou H. UCAV Formation Online Collaborative Trajectory Planning Using hp Adaptive Pseudospectral Method. * Math. Probl. Eng. *** 2018**,* 2018,* 1–25. [Google Scholar]

10.

Wang Y, Liu H, Zheng W, Xia Y, Li Y, Chen P, et al. Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning. * IEEE Access*** 2019**,* 7,* 39974–39982. [Google Scholar]

11.

Fahrmann D, Jorek N, Damer N, Kirchbuchner F, Kuijper A. Double Deep Q-Learning With Prioritized Experience Replay for Anomaly Detection in Smart Environments.* IEEE Access *** 2022**,* 10,* 60836–60848. [Google Scholar]

12.

Xu Y-H, Yang C-C, Hua M, Zhou W. Deep Deterministic Policy Gradient (DDPG)-Based Resource Allocation Scheme for NOMA Vehicular Communications.* IEEE Access *** 2020**,* 8,* 18797–18807. [Google Scholar]

13.

Meng W, Zheng Q, Shi Y, Pan G. An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning.* IEEE Trans. Neural. Netw. Learn. Syst. *** 2022**,* 33,* 2223–2235. [Google Scholar]

14.

Li B, Gan Z, Chen D, Sergey Aleksandrovich D. UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning.* Remote Sens. *** 2020**,* 12,* 3789. [Google Scholar]

15.

Chikhaoui* *K,* *Ghazzai* *H, Massoud* *Y. PPO-based Reinforcement Learning for UAV Navigation in Urban Environments. In Proceedings of the 2022 IEEE 65th International Midwest Symposium on Circuits and Systems (MWSCAS), Fukuoka, Japan, 7–10 August* *2022;* *pp.* *1–4.

16.

Wong C-C, Chien S-Y, Feng H-M, Aoyama H. Motion Planning for Dual-Arm Robot Based on Soft Actor-Critic.* IEEE Access *** 2021**,* 9,* 26871–26885. [Google Scholar]

17.

Xie R, Meng Z, Wang L, Li H, Wang K, Wu Z. Unmanned Aerial Vehicle Path Planning Algorithm Based on Deep Reinforcement Learning in Large-Scale and Dynamic Environments. * IEEE Access *** 2021**,* 9,* 24884–24900. [Google Scholar]

18.

Liu Q, Shi L, Sun L, Li J, Ding M, Shu FS. Path Planning for UAV-Mounted Mobile Edge Computing With Deep Reinforcement Learning.* IEEE Trans. Veh. Technol. *** 2020**,* 69,* 5723–5728. [Google Scholar]

19.

Elfatih* *NM,* *Ali* *ES, Saeed* *RA.* *Navigation and Trajectory Planning Techniques for Unmanned Aerial Vehicles Swarm.* *In* Artificial Intelligence for Robotics and Autonomous Systems Applications*;* *Azar AT, Koubaa A,* *Eds.;* *Springer* *International* *Publishing:* *Cham, Switzerland, 2023;* *pp.* *369–404.

20.

He L, Aouf N, Song B. Explainable Deep Reinforcement Learning for UAV autonomous path planning. * Aerosp. Sci. Technol. *** 2021**,* 118,* 107052. [Google Scholar]

21.

Qie H, Shi D, Shen T, Xu X, Li Y, Wang L. Joint Optimization of Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning.* IEEE Access *** 2019**,* 7,* 146264–146272. [Google Scholar]

22.

Cui Z, Wang Y. UAV Path Planning Based on Multi-Layer Reinforcement Learning Technique. * IEEE Access *** 2021**,* 9,* 59486–59497. [Google Scholar]

23.

Bayerlein H, Theile M, Caccamo M, Gesbert D. Multi-UAV Path Planning for Wireless Data Harvesting With Deep Reinforcement Learning.* IEEE Open J. Commun. Soc.*** 2021**,* 2,* 1171–1187. [Google Scholar]

24.

Sumiea EH, Abdulkadir SJ, Alhussian HS, Al-Selwi SM, Alqushaibi A, Ragab MG, et al. Deep deterministic policy gradient algorithm: A systematic review. * Heliyon*** 2024**,* 10,* e30697. [Google Scholar]

25.

Zheng S, Liu H. Improved Multi-Agent Deep Deterministic Policy Gradient for Path Planning-Based Crowd Simulation.* IEEE Access *** 2019**,* 7,* 147755–147770. [Google Scholar]

Yu J, Sun Q, Sun H. Multi-Objective Distributed Real-Time Trajectory Planning for Gliding Aircraft Cluster. *Advanced Materials & Sustainable Manufacturing* **2024**, *1*, 10008. https://doi.org/10.70322/amsm.2024.10008

Yu J, Sun Q, Sun H. Multi-Objective Distributed Real-Time Trajectory Planning for Gliding Aircraft Cluster. *Advanced Materials & Sustainable Manufacturing*. 2024; 1(2):10008. https://doi.org/10.70322/amsm.2024.10008

TOP