By Xiaolin Hu, Yousheng Xia, Yunong Zhang, Dongbin Zhao

The quantity LNCS 9377 constitutes the refereed complaints of the twelfth overseas Symposium on Neural Networks, ISNN 2015, held in jeju, South Korea on October 2015. The fifty five revised complete papers provided have been rigorously reviewed and chosen from ninety seven submissions. those papers conceal many issues of neural network-related learn together with clever regulate, neurodynamic research, memristive neurodynamics, desktop imaginative and prescient, sign processing, computing device studying, and optimization.

In [5], policy iteration algorithm for discrete-time nonlinear systems was developed. For many traditional iterative ADP algorithms, they require to build the model of nonlinear systems and then perform the ADP algorithms to derive an improved control policy [11, 16, 18–22, 24, 27, 28]. In contrast, Q-learning, proposed by Watkins [14, 15], is a typical data-based ADP algorithm. In [10], Q-learning was named action-dependent heuristic dynamic programming (ADHDP). For Q-learning algorithms, Q functions are used instead of value functions in This work was supported in part by the National Natural Science Foundation of China under Grants 61273140, 61304086, 61374105, and 61233001, and in part by Beijing Natural Science Foundation under Grant 4132078.

It is known that there exists a unique solution x(t, ξ ) on t ≥ 0 with initial data ξ ∈ CFb0 ([−τ ,0], Rn ) . Moreover, both f ( x, y, t ) and σ ( x, y, t ) are locally bounded in ( x, y) and uniformly bounded in t . For each V ∈ C 2,1 ( R n × R+ ; R+ ) , we define an operator LV from R n × R n × R+ to R by LV = ∂V / ∂t + ∂V / ∂x ⋅ f + 1/ 2trace[σ T (∂ 2V / ∂xi ∂x j )σ ] (3) where ∂V / ∂z = (∂V / ∂z1 ,…, ∂V / ∂zn ) . (Invariance principle [20]) Assume that there are functions 1 n V ∈ C ( R × R+ ; R+ ) , β ∈ L ( R+ , R+ ) and ω1 , ω2 ∈ C ( R , R+ ) such that Lemma 2,1 1.

Assumption 4. The utility function U (xk , uk ) is a continuous positive definite function of xk and uk . A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm 45 Define the control sequence set as Uk = uk : uk = (uk , uk+1 , . ), ∀uk+i ∈ Rm , i = 0, 1, . . Then, for a control sequence uk ∈ Uk , the optimal performance index function is defined as J ∗ (xk ) = min J(xk , uk ) : uk ∈ Uk . uk (3) According to [14] and [15], the optimal Q function satisfies the Q-Bellman equation Q∗ (xk , uk ) = U (xk , uk ) + min Q∗ (xk+1 , uk+1 ).