rl
Introduction
Reinforcement Learning Summary
Elements of Reinforcement Learning
Markov Decision Process
Markov Processes
Markov Reward Processes
Markov Decision Processes
Solution (Algorithm)
수렴하는 보상의 합 Return
Value Function
확률로 기대값 구하기
Backup Diagram
Bellman Equation
Dynamic Programming
Policy Evalution
Policy Improvement
Policy Iteration
Value Iteration
Generalized Policy Iteration
Summary Dynamic Programming
Monte Carlo methods
Monte Carlo Policy Evalution
Monte Carlo Estimation of Action Values
Monte Carlo Control Policy Improvement
On Policy Monte Carlo Control
Off Policy
Off Policy Monte Carlo Control
Incremental Implementation
Summary
Temporal Difference Learning
TD Prediction
Advantages of TD Prediction Methods
Optimality of TD(0)
Sarsa On Policy TD Control
Q Learning Off Policy TD Control
Actor Critic Methods
R learning for Undiscounted Continuing Tasks
A Unified View
Eligibility Traces
N Step TD Prediction
The Forward View of TD lambda
The Backward View of TD Lambda
Equivalence of Forward and Backward Views
Generalization and Function Approximation
Value Prediction with Function Approximation
Gradient Descent Methods
Linear Methods
Coarse Coding
Tile Coding
Control with Function Approximation
Off-Policy Bootstrapping
Powered by
GitBook
Equivalence of Forward and Backward Views
forward 관점과 backward 관점을 수식을 풀어 같음을 증명한다.
아래 수식을 증명하여 같음을 보인다.
식 7.8
여기서 ,
는
-return alogrithm 의 TD-error 이다.
즉 forward 관점이다.
그리고,
는
TD() 의 TD-error
즉 backward 관점이다.
그리고,
는
이면 1, 아니면 0 을 리턴하는 함수
identity indicator function
results matching "
"
No results matching "
"