The research about unmanned aerial vehicle (UAV) swarm has developed rapidly in recent years, especially the UAV swarm with sensors which is becoming common means of achieving situational awareness. Due to inadequate researches of the UAV swarm with complex control structure currently, we propose a patrolling task planning algorithm for the UAV swarm with double-layer centralized control structure under the uncertain and dynamic environment. The main objective of the UAV swarm is to collect environment information as much as possible. To summarized, the primary contributions of this paper are as follows. We first define the patrolling problem. After that, the patrolling problem is modeled as the Partially Observable Markov Decision Process (POMDP) problem. Building upon this, we put forward a myopic and scalable online task planning algorithm. The algorithm contains online heuristic function, sequential allocation method, and the mechanism of bottom-up information flow and top-down command flow, reducing the computation complexity effectively. Moreover, as the number of control layers increases, this algorithm guarantees the performance without increasing the computation complexity for the swarm leader. Finally, we empirically evaluate our algorithm in the specific scenarios.
Figures
Citation: Zhou X, Wang W, Wang T, Li X, Jing T (2018) Continuous patrolling in uncertain environment with the UAV swarm. PLoS ONE 13(8): e0202328. https://doi.org/10.1371/journal.pone.0202328
Editor: Yong Deng, Southwest University, CHINA
Received: October 4, 2017; Accepted: July 9, 2018; Published: August 24, 2018
Copyright: © 2018 Zhou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: In this paper, the research was supported by National Nature Science Foundation of China (Grant No. 71373282) and National Social Science Foundation of China (Grant No. 14BTQ034).
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
UAV has rapidly developed in recent years [1, 2], such as agricultural plant protection, pipeline inspection, fire surveillance and military reconnaissance. In August 2016, Vijay Kumar put forward the “5s” development trend of UAV, which is small, safe, smart, speed and swarm. Particularly, swarm intelligence [3, 4] is the core technology of the UAV swarm, attracting more and more researchers. The study of swarms began in behavior study of insect communities by Grasse in 1953 [5]. For example, the behavior of the single ant is quite simple, but the group of ant colony composed of these simple individuals, shows a highly structured social organization, which can accomplish complex tasks far beyond the individual’s ability.
The UAV swarm here is a large scale multi-agent system [6] with the complex relationship. Complex relationships can generate complex behaviors, adapting to complex environments and accomplishing complex missions. Compared to the small-scale multi-UAV system, the UAV swarm holds many new advantages, such as lower cost, higher decentralization, higher survival rate, and multi-functionality, making the UAV swarm be popular in enterprise, government and army. Nevertheless, in order to fulfil these advantages, some difficulties should to be solved: Firstly, it is difficult to manage [7, 8]. In fact, some abnormities will emerge more easily, like collision, crash, and loss of communication when managing the large number of UAVs. Secondly, it is difficult to coordinate [9, 10]. There are many types of relationships in the swarm, such as the command and control relationship between inferior and superior, negotiation relationship among the same layer or different layers. Thirdly, it is difficult to make decision [11]. The complexity of the uncertainty environment and dynamic relationship leads to the complexity of decision making. Thus, the UAV swarm needs new command and control mechanisms.
In this paper, we consider the scenario where a UAV swarm aiming to patrol an area [12, 13] continuously to gather information as much as possible. It is quite common in reality. For instance, the UAV swarm searches for missing tourists in the mountain forest, reconnoiters the battlefield to get the situation information, patrols in the farmland to protect agricultural plant. In these areas, environments change dynamically and uncertainly, while UAVs in the swarm just have local views. In other words, the environments are partially observable. Hence, when planning the sequence of locations to visit, the UAV swarm has the difficult task of estimating the maximum information to be gained in these locations. To date, a number of approaches to patrolling the environment with teams of UAVs have been proposed. However, most of the researches focus on developing algorithms for UAVs with single-layer control structure, which haven’t taken complex relationships into consideration. In light of this, the main challenge is to use the UAV swarm with complex relationship to monitor the dynamic and uncertain environment in this paper. That is, the UAV swarm needs to patrol the environment to provide up-to-date information by a proper mechanism.
We model this challenge as a general class of information gathering problem. Based on the above considerations, this paper presents a new algorithm for the patrolling task planning of the UAV swarm with double-layer centralized control structure in uncertain environment, denoted as USPA(UAV Swarm Patrolling Algorithm). Firstly, we define the UAV swarm patrolling problem. Agent in different layers has different viewpoints. Although the granularity of time, action, information and physical layout graph is different between the upper-layer environment and the lower-layer environment, there are connections between the them. Secondly, the patrolling problem is model as POMDP [14]. Due to the exponential growth of action space and observation space, existing POMDP solvers are inefficient to dead with our POMDP formulation effectively. Thus, we propose an online myopic algorithm to solve this formulation. To summarize, the primary contributions of this paper are follows:
Firstly, we propose a computable double-layer centralized control structure for the UAV swarm. Our structure extracts core interactive processes of controlling the large-scale multi-UAV system. Moreover, the control structure has scalability, which can be extended to more layers with centralized control structure and manage more UAVs.
Secondly, we propose a myopic online task planning algorithm for the UAV swarm. This algorithm has scalability, adapting to the centralized control structure with more layers. As the number of layers increases, this algorithm still guarantees the bound performance. Moreover, the decision process is allocated to all the decision-making nodes, without increasing the computation complexity for the swarm leader.
Thirdly, we construct some scenarios to evaluate the performance of our algorithm. We conduct case experiment and parameter sensitivity analysis experiment. The experiment results show that the UAVs can gather as much information as possible and take as little time as possible based on our algorithm. Additionally, our algorithm have the realistic significance.
Additionally, the paper is organized as follows. In section 1, we introduce the background of our research. Then in section 2, the relative literatures are reviewed. In section 3, we formally define the UAV swarm patrolling problem. Given this, the UAV swarm patrolling problem is formulated as POMDP in section 4. In this section, the patrolling algorithm is provided to calculate polices for every sub-swarm leader and information gathering UAV. After that, we put forward proof on the decision-making mechanism and corollaries about scalability and performance bound in section 5. In section 6, our algorithm is evaluated through simulation experiments empirically by comparing with benchmark algorithms in the same problem background. Finally, we conclude and point out more research work in section 7.
2 Related work
In this section, we review related work on dynamic environment model, command and control structures and approaches for the patrolling task planning problem.
Generally speaking, approaches to gather situational awareness without considering threats are typically categorized as the information gathering problem, where agents aim to continuously collect and provide up-to-date situational awareness. One of the challenges in this type of problem is to predict the information at other coordinates in the environment with limited data. As for the environment model, Gaussian Processes [15] are often used in recent years, effectively describing the space-time relationship of the environment. Additionally, topology graph is abstract way to model environment from different perspectives. Compared to topology graph, Gaussian Processes displays more details about environment. However, topology graph abstracts the core elements of environment, which is helpful to concentrate on research object. As for the environmental dynamic, most environment models are static in previous work [16]. Presently, Markov chain are widely used to model non-static random environment objects, such as the physical behavior of the wireless network [17], the storage capacity of the communication system channel [18] and communication channel sensing problem [19]. In these papers, the Markov model is added some different assumptions. As for patrolling problem with the UAV swarm, the Markov chain is one of the most popular models. For instance, the ground target is modeled as an independent two-state Markov chain in paper [20]. Paper [21] models the patrolling environment with threat state and information state as K-state Markov chain. Paper [22] uses the Markov chain to represent hidden movement of targets. In this paper, we assume the UAV swarm patrolling environment is a topology graph, changing with K-state Markov chain.
Due to the large number of UAVs, the command and control structure of the UAV swarm should be taken into consideration. Nowadays, there are many control structures about inner-loop controller in UAV [23, 24], which are different from our research. What we concern is the relationship among UAVs in the swarm. Generally speaking, control structures of the swarm can be divided into general structure and computable structure. General structures are coarse granularity, which can be applied to a variety of fields. In general structures, research object is described by qualitative method, lacking quantitative analysis. The AIR [25] model divides control structures into four basic patterns: the directed control structure, the acknowledged control structure, the virtual control structure and the collaborative control structure. The 4D/RCS [26] model provides a theoretical basis for unmanned ground vehicles on how their software components should be identified and organized. The 4D/RCS is a hierarchical deliberative architecture that plans up to the subsystem level to compute plans for an autonomous vehicle driving over rough terrain. Paper [27] proposes a scalable and flexible architecture of real-time mission planning and dynamic agent-to-task assignment for the UAV swarm. Compared to general control structures, computable control structures are fine granularity and quantitative. For example, aiming at centralized control structure and decentralized control structure, paper [28] introduces three methods to solve the cooperative task planning for the multi-UAV system. Paper [29] proposes a task planning method of single-layer centralized control structure in dynamic and uncertain environment. However, most of computable control structures are single-layer presently. Thus, in order to effectively manage large-scale UAVs, the computational control structures with complex relationship should be taken into consideration.
There are many approaches to solve the task planning problem [30], such mathematical programming, Markov decision process (MDP) and game theory [28]. As for the continuous information gathering problem, MDP based algorithms are more appropriate due to the property of multi-step programming. For instance, in fully observable environments, paper [31] proposes a MDP based algorithm to compute policies for all the UAVs. Moreover, POMDP and Dec-centralized POMDP (Dec-POMDP) [32] are widely used to partially observable environments. However, most of the researches on patrolling problem of the UAV swarm are single-layer control structures. And our work in this paper mainly extends to double-layer control structure. Due to the exponential growth in the number of possible course of actions of UAVs, solving this formulation using current POMDP solvers [33] is hard. Partially Observable Monte Carlo Planning (POMCP) [34] extends some benchmark algorithms to solve multi-agent POMDPs. POMCP breaks up curse of dimensionality and the curse of history, providing a computationally efficient best-first search that focuses its samples in the most promising regions of the search space. However, as for the large scale multi-agent patrolling problem, the state space is still too large to apply POMCP into multi-POMDP problem directly.
3 The UAV swarm patrolling problem
In this section we present a general patrolling problem formalization of the UAV swarm with double-layer centralized control structure. Here, we introduce the patrolling problem of the UAV swarm in three aspects: overview of the patrolling problem, the physical environment and patrolling UAVs.
3.1 Overview of the patrolling problem
The environment is modeled as the upper-layer environment and the lower-layer environment for different decision makers. The upper-layer environment and lower-layer environment correspond to the same real environment. The control structure falls into the upper-layer control structure and the lower-layer control structure for different decision makers. There are three types of UAVs: the swarm leader, the sub-swarm leader and the information gathering UAV (I-UAV for short). The swarm leader and the sub-swarm leader are decision makers. The upper-layer environment, lower-layer environment and three types of UAVs are shown in Fig 1. The lower-layer environment provides information for sub-swarm leaders, and I-UAVs follow sub-swarm leaders’ command. After that the sub-swarm leaders provide information for the swarm leader and follow the swarm leader’ command. The difference between two layers is granularity of time, layout graph, action, and information belief.
Fig 1. Overview of the environment and the UAV swarm.
The swarm leader is represented by a blue hexagon. There are several sub-swarms in a swarm, and every sub-swarm contains several I-UAVs. The leader of a UAV swarm is called as the swarm leader, while the leader of a sub-swarm is called as the sub-swarm leader, and UAVs which are directly subordinate to the sub-swarm leader are called as I-UAVs. In reality, the swarm leader may be a high intelligence UAV in the UAV swarm, a ground control station, or an early warning machine. The main function of the swarm leader is to allocate course of actions for each sub-swarm leader. Sub-swarm leaders are represented by yellow five-pointed stars. They play the role of actor in the upper-layer environment, while they are decision makers in the lower-layer environment, leading a sub-swarm and allocating the course of actions for each I-UAV. I-UAVs are represented by red rhombus, directly controlled by their superior sub-swarm leader. The function of I-UAV is to collect environmental information. Additionally, the upper-layer control structure and lower-layer control structure are both centralized control structure, and there are no interactions between UAVs with peering relationship. Here, let l denote the lower-layer environment, let h denote the upper-layer environment, let u denote the sub-swarm leader, and let w denote the I-UAV. Moreover, the meanings of symbols in this paper are shown in Table 1.
Table 1. A summary of the notation used throughout this paper.
3.2 The physical environment
The physical environment is defined by its spatial-temporal and dynamic properties, encoded by the lower-layer environment and upper-layer environment based on the control structure, specifying how and where UAVs can move. In fact, the physical environment is an interested area for people like a mountain forest, a battlefield, or a farmland, where people need urgent continuous intelligence information. Each vertex in undirected graph refers to an area in reality, and edge indicates it is connected between two vertices.
Definition 1 (Layout graph) The layout graph is an undirected graph G = (V, E) that represents the layout of the physical environment, where the set of spatial coordinates V is embedded in Euclidean space, and the set of edges E denotes the movements that are possible.
Our model contains the upper-layer layout graph and lower-layer layout graph, denoted as Ghand Gl separately. The upper-layer layout graph and lower-layer layout graph corresponds to the same physical environment. There is a correspondence between Gh and Gl.
Definition 2 (Information level) The information level qualitatively represents the content of interested information, denoted as Ik ∈ {I1, I2, …, IK}, where K is the number of levels. The information level vector is denoted as I = [I1, I2, …, IK].
Each vertex has a certain information level at a time. We regard the physical environment is dynamic and partially observable. So the information level of each vertex changes with time. Specifically, an I-UAV can only access the current location in Gl and gather the information. When an I-UAV visits a vertex, the information level of this vertex will be reset to I1. In other words, there is no more new information after the most recent visiting.
Definition 3 (Information value) The information value is a quantification of information level, denoted as f(Ik), Ik ∈ {I1, I2, …, IK}. Function assigns information level to information value. The information value vector is denoted as F = [f(I1), f(I2), …, f(IK)].
In order to reduce the decision complexity of the swarm leader, the significant and interested information are extracted from the lower-layer layout graph. Moreover, information value of vertices where UAVs haven’t visited for some time may increase. Thus, we regard that the function f(⋅) is monotonically increasing. And the information value transition matrix P is as follows:(1)
Assumption 1 The change of information value obeys the independent and discrete-time multi-state Markov chain according to Eq 1.
Here, we assume the information value state transition matrix P is known in advance. Phrepresent the upper-layer transition matrix, while Pl represents the lower-layer transition matrix. Additionally, the concept of stochastic dominance is widely used in many applications [35], such as economy, finance, and statistic. Specifically, the stochastic dominance of two K-dimension vectors x = {x1, x2, …, xK}, y = {y1, y2, …, yK} is defined as x ≻ y, if:(2)
Assumption 2 Information value vector F follows stochastic dominance.
Intuitively, if a vertex v has higher information value than other vertices currently, the vertex vmight have higher information value at the next moment.
Assumption 3 Information value transition matrix P is a monotone matrix.
Generally, if there are no UAVs gathering information in an area, the unknown information of this area may increase with time. The monotone matrix [36] P satisfies:(3)
As for two compact information belief vectors (See Eq 13) bn and bn′, if bn ≻ bn′, then bn⋅P ≻ bn′⋅P [17]. If there are no UAVs visiting vertex vn and vn′ at the moment, their information belief vectors will also maintain stochastic dominance at the next moment. Additionally, if bn ≻ bn′, then bn⋅F ≥ bn′⋅F, which means that the belief vector with stochastic dominance may have higher information value.
Definition 4 (Time) Time is modelled by a discrete set of temporal coordinates t ∈ {0, 1, …}, henceforth referred to as time steps.
The lower-layer time step is denoted as tl, and upper-layer time step is denoted as th. Here, a time step contains a OODA (Observation, Orientation, Decision, Action) for all the agents with the same layer. And there is a correspondence between them.
Definition 5 (Time Step Ratio) Time step ratio is the ratio of the real time of one upper-layer time step to that of one lower-layer time step, denoted as M.