Everything You Need to Know About Reinforcement Learning

Yashwardhan Panwar

2 months ago

Table of Contents

All About Reinforcement Learning

Do you know how dog owners train their dogs to sit when they say ‘sit’ or stand when they say ‘stand’? A part of this pet training is to encourage the dog with a treat whenever it sits or stands at its owner’s command.(reinforcement learning)

Take another example – parents want their child to do their homework regularly. So every time the kid finishes their homework, the parents often praise them or give them sweets. But whenever the kid yells or throws a tantrum, their parents scold them or punish them. In both cases, the parents try to encourage a certain action with rewards and discourage the other with punishment.

This is called reinforcement – defined by Google as “the process of encouraging or establishing a belief or pattern of behavior”. Interestingly, reinforcement is not just limited to pets or children or any other living creature. It can be used to train AI software’s or machines as well in the form of reinforcement learning.

Reinforcement learning (or RL) enables a machine or software to learn by itself i.e. self-teach and get better at doing things without the need for human intervention. Let’s understand more about what reinforcement learning exactly is, how it works, its benefits, its applications, and more.

What is reinforcement learning?

Machine learning is firstly categorized into 3. The first two are supervised and unsupervised learning for which humans need to feed data into the software. The third one i.e. reinforcement learning does not begin with any predefined data. It gathers its own data through experimentation and exploration.

For example…

Let’s say we have a bot named Joe. We want Joe to move from its original position (say A) to another point (say B) as quickly as possible. Through reinforcement learning, Joe will try all the possible paths, then in the end, will decide on the fastest one. Now the next time Joe is asked to move from point A to B, it will directly take the shortest path. This is called the trial-and-error method.

The software explores every possible action or sequence of actions to find the most desirable one which in this case is the shortest path from A to B. But where’s reinforcement in here?

Whenever Joe takes the shorter path, it receives a positive signal that acts as the reward. The shorter it is, the more positive the signal. In this way, reward or positive signals encourage Joe to take the most desirable action while some punishment or negative signals discourage him when he takes the undesirable ones.

Next, we shall understand how RL works in-depth and the algorithms behind it…

How does reinforcement learning work?

Firstly, there are 5 main elements or components of reinforcement learning:

The agent is the autonomous entity (machine or software) that makes the decisions and interacts with the environment.
The environment is what the agent interacts with. The agent can either interact directly with the environment or with an internal model of the environment to plan its course of action.
The policy is the sequence of actions performed by the agent in particular situations or states.
The reward is received by the agent in the form of a positive signal. The agent may compare two or more actions and choose to perform the one with a higher reward.
The value function is the cumulative reward of a particular course of action. When making a decision, the agent prioritizes the path with maximum rewards in the long run instead of the one with immediate benefits.

Now, let’s move on to the 2 types of algorithms used by RL: model-based and model-free…

Model-based reinforcement learning(RL)

In model-based RL, the agent creates an internal model of the environment. This internal model serves as the testing grounds for various actions that the agent can take in the environment. Once it has decided on the best path based on its internal model, it executes it in the external environment.

Let’s go with the same bot, Joe, to understand this better…

We want Joe to travel from his current position at the post office to the nearby hospital. Firstly, he creates an internal map of the area covering both places. Then within his internal map, he takes every possible route from the post office to the hospital. He analyzes them and associates a reward value to each route – the longer one with lesser value and the shorter one with higher value. After assigning values to every route, he easily identifies the one with the highest reward value (that is the shortest route). Now, it actually takes this high-reward route in the real environment to reach the hospital. Moreover, if you want Joe to go from the hospital to the convenience store in the same area next, he can find the shortest path faster as he already has the internal map ready.

That said, this type of algorithm works best for a static and unchanging environment.

Model-free

Unlike model-based RL, model-free RL does not ‘think’ of all possible actions to identify the best one. It directly executes all of them one by one, compares the results, and then chooses the most desirable one. This is because it does not create an internal model of its environment, hence the name ‘model-free’. You can say it’s an experiential learner who learns through trial and error. And although it may look dumb, unlike the smart model-based RL, model-free RL can work in dynamic and unknown environments.

Benefits

Unlike conventional machine learning, RL understands the concept of long-term benefits making it more human-like. It can sacrifice short-term benefits or even go on a negative scale to get the maximum benefit in the long run. Hence, it’s suitable for achieving long-term goals.
RL does not require any data to be fed like supervised or unsupervised learning. The agent interacts with the environment first-hand to collect data. This is called self-teaching. This reduces
RL allows the machine or software to work in complex, changing, and unpredictable environments as in the case of model-free algorithms.

Challenges

Since RL can ignore short-term rewards for better long-term rewards, some of its actions or decisions can be difficult to interpret. This can cause any external observer to doubt the agent’s functioning.
Applying RL in real-world environments can be impractical many times. Since the machine is always trying to explore and take newer actions to gather data, it can be impossible to consistently make the best decision.
RL training in complex environments can be time-consuming as it requires a lot of computation and processing.

Examples of Reinforcement Learning

AlphaGo: Go is an ancient Chinese board game that is similar to Chess but far more complex than it. AlphaGo is a computer program developed by DeepMind Technologies that defeated the Go world champion, Fan Hui, in 2015. AlphaGo Zero is an even more powerful version of AlphaGo that was trained by playing against itself.
Self-driving cars: RL allows self-driving or autonomous cars to navigate real-time traffic. It is first trained in a variety of simulated environments and conditions. After that too, it continuously gathers data and learns from experience.
Recommendation systems: ‘Frequently bought together’ or ‘recommended reads’ or ‘recommended watch’ are a few examples of RL. Recommendation systems analyze customer behavior to recommend products on online shopping platforms like Flipkart or the next movie to watch on streaming platforms like Netflix.

Conclusion

Reinforcement learning has shortened the gap between machines and humans by allowing machines to self-teach and learn through exploration. It indicates the coming of an age where humans no longer need to feed data to machines or software’s but rather allow them to explore, experiment and gather data at their own pace. This opens up the possibility for independent machines and AI technologies that can perform far more complex tasks than humans ever can with much more efficiency and adaptability.