Run for your lives!
The self-learning cars, drones, and games are here! The Terminator movie will soon become a reality!
Hold on for a sec before you wrap tin foil around your head. AI surely has power and potential but, right now, isn’t that scary and certainly is not a thing that’ll wipe out humanity — especially with reinforcement learning.
Here’s what it really is: A niche of machine learning based on “good behavior points.” Every time the algorithm does a good thing in a certain environment, it gets a little gold star. The more gold stars it collects, the more inspired it gets to always make the right decisions. It’s a positive AI spiral loop for human benefit.
So how do reinforcement algorithms work and become smarter? What’s the reward system, and how do they learn no matter the environment?
Reinforcement learning – the ultimate expert’s overview
Reinforcement learning (RL) is a machine learning flavor dating back to the ’50s. But before taking off in the ’80s, reinforcement learning was just another computer science concept focused on simple artificial maze-solving and pole-balancing tasks.
The early days of reinforcement learning were uneventful and unexciting. You have a learning agent, a set environment, and a basic reward system, like a hamster in a maze you feed every time it gets to the exit.
It screamed boredom!
It wasn’t anything superior like Boston Dynamics Atlas (the AI parkour robot) or DeepMind’s AlphaGo AI system. But that all changed in 1983, with the “Introduction to Reinforcement Learning” book by Richard S. Sutton and Andrew G. Barto. Fundamentals like value-based, policy-based, and actor-critic methods paved the road for reinforcement learning and forever changed the machine-learning industry.
With time, new, more sophisticated, intelligent versions with real-world practices of RL appeared.
What’s under the reinforcement learning hood?
Reinforcement learning comes with three essential components:
- Agent — The learning algorithm, which has one goal and purpose: to maximize the total reward received by taking the right actions in the environment.
- Environment — An external system with which the agent interacts is driven by a Markov decision process (MDP) and comes in two different types: stochastic or deterministic. In a stochastic environment, the reward system and transition probabilities are uncertain and random. That’s not the case with deterministic ones.
- Reward — A measure of how well the agent is performing, also provides feedback about the consequences of their actions. The reward system also comes in two flavors, dense and sparse. In a sparse reward system, the agent may or may not receive a reward signal. With a dense system, brownie points are more frequent and consistent.
Reinforcement learning iceberg – how deep it goes?
Those components are just the tip of the reinforcement learning iceberg! Going deeper, there are two main types of RL algorithms: model-based and model-free.
Think of it this way: You’re navigating a complex maze. With a model-based approach, you have a detailed map of the maze. You know precisely where every dead end is and how to reach the exit. Geared with such knowledge, planning the exit route and reaching your goal is duck soup.
Using such an approach, the agent has a perfect model of the environment and uses it to plan its actions. That makes the reward system more consistent and reward signals more frequent.
But with a model-free approach, you have no map, no model of the environment, nothing!
You’re in a maze, and the only thing that’ll help you escape is your past experiences. As you explore, you remember which paths are dead ends and which ones are shortcuts to the exit. You’re putting this intuition-based information system at the helm of your actions.
With a model-free approach, the agent learns as it goes without a model, guided by its past experiences of the environment.
Some of the most popular model-based and model-free algorithms are:
- Q-Learning — The Top G of model-free reinforcement learning, estimates the optimal action-value function by iteratively updating the Q-values based on observed rewards and actions.
- Deep Reinforcement Learning — It’s a new breed of model-free algorithm using deep neural nets to learn the value function or policy directly from high-dimensional inputs such as images or raw sensory data.
- Monte Carlo Method — This unbiased model-based algorithm does not assume knowledge of the transition dynamics or the rewards of the environment.
- Tree Search Method — This model-based algorithm builds a search tree to explore possible environmental outcomes and selects the optimal policy. It’s mostly used in gaming, where the state space is always infinite for exhaustive research.
What real-world examples tell us about reinforcement learning
Reinforcement learning will soon dominate the healthcare, finance, and marketing industries. Don’t get me wrong, it won’t take away your job, but rather improve our decision-making!
The most famous real-world application is DeepMind’s AlphaGo learning algorithm defeating the world champion in the ancient game of Go.
In 2017, in the finance industry, Goldman Sachs developed an improved AI finance software for optimizing the execution of large equity trades. They trained the algorithm using historical trade data to execute trades by optimizing a reward function based on factors such as execution price, time, and market impact.
Waymo, a self-driving car company, pushed reinforcement learning even further. Their self-driving cars are tackling the dense city streets by learning with reinforcement learning the complex environments of modern-day transportation.
Disease modeling, clinical trials, precision medicine, and medical imaging are other real-world uses-case of reinforcement learning in the healthcare industry.
Does that mean new advanced AI algorithms will replace human jobs?
Not really. We are far from the dystopian Terminator future. The reinforcement learning vision is to create a collaboration between AI and humans. With RL, we’ll cure complex diseases, create better-tailored medicine prescriptions, and make more accurate diagnoses.
Reinforcement learning – Are we close to a Terminator-like future?
Reinforcement learning is a trial-and-error approach to machine learning.
With reinforcement learning, the agent interacts with its environment to learn which actions are good or bad, just like how Arnold Schwarzenegger’s T-800 learns from its surroundings and completes the tasks ahead.
It’s one of the ways of making an unstoppable AI capable of learning almost anything.