Overview
Artificial intelligence. Neural networks. Deep learning. Reinforcement learning.
These are some of the most interesting, active topics in computer science, the physical sciences, engineering, design, and art, and they’re completely transforming how we as humans approach and solve problems. My goal in creating this page is to help people that are unfamiliar with AI build intuition about what it is, as well as feel more comfortable discussing some of its topics or deciding what to focus on if entering the field.
Without further ado, let’s get started. The first section of this page provides broad intuition about what AI is, and the second solidifies that understanding with some fascinating real-world examples of applied AI.
Building Intuition for Artificial Intelligence (AI)
AI Solves Two Problems
Really simply, there are two types of problems that AI solves:
Prediction/classification problems (what is something, what will something be)
Is this a picture of X or a picture of Y?
Given A, B, and C, what would X be?
Control problems (how to do something, optimally)
Playing chess
Navigating a maze
Self-driving
Prediction/classification problems are solved with neural networks and varieties thereof. Control problems are solved with reinforcement learning, which itself can use neural networks (deep reinforcement learning). Since we’re new to AI, saying that AI uses neural networks doesn’t really help us much. So, let’s start with an example to anchor some of these concepts.
Data & Relationships
Let’s say that we want to understand a potential correlation between the weather yesterday (temperature and humidity) and the price of energy the day after at 12pm. We have the below table of some historical data on this:
Now, if we had a table of all possible data and all the values, we wouldn’t need to try and understand this relationship because we could simply lookup the value from our complete table. In reality, we never have such a complete table, and we could also have rows like the below:
These two rows have the same inputs but a different value of the output. This tells us one of two things:
There is no relationship between these inputs and the outputs
There is a fundamental relationship, but we’re missing some inputs that are needed to establish it
Maybe we if we had an extra column like below the values would make more sense:
Making Sense of Things with Domain Knowledge
In the final table above, why does the day of week column make the values make more sense? Well, knowing a bit about grid load, we could posit that more people use more power during the work week than during the weekend. So, all things constant (humidity and temperature), power will cost more because demand is higher during the work week.
So, let’s finish formalizing this relationship. Maybe we have a table of the below inputs and now we need to predict what the output is:
Using our domain knowledge, maybe we have something like:
if day of week == Monday through Friday
use average of the data where temperature is close and humidity is close
if day of week == Saturday or Sunday
use average of the data where temperature is close and humidity is close
This might work, but we had to use our understanding of the underlying system (domain knowledge) to come up a specific formula/model to predict the value. What if we had to predict values for the below table:
We’d need to find a new domain expert (if one existed), and we’d also need a new formula/model. The domain expert for the grid price can’t work on this problem and vice-versa, and the formulas are specific to each application.
Let’s throw in another prediction problem that is equally unrelated to both of these:
We need yet another domain expert and another formula.
General Methods with Domain Knowledge
In the case above, we have 3 problems, with 3 domain experts, with 3 models — the methods we are using are not general, meaning they cannot be used “out-of-the-box” between problems. A general solution to this would look like 3 problems, with 3 domain experts, with 1 model.
I’ll use the simplest example: linear regression. One could use linear regression as the general method to solve a relationship in unrelated problems.
However, how does one know what to regress against? What should the input x be to accurately predict y? That usually comes from our domain knowledge. Also, what if y is predicted by a, b, and c? We now have potentially non-linear relationships in the data that strain our simple general method. Further, what if y is predicted by a, b, c, d, …, n? We have now totally lost our general method for this prediction, and the dimensionality of the inputs has probably become too high for a domain expert to solve (at least in a reasonable amount of time).
We want to have general methods that work with different numbers of features, that don’t need domain knowledge to solve.
General Methods without Domain Knowledge
This is how we arrive at some of the key motivations of the prediction/classification branch of artificial intelligence:
highly general methods
minimal to no domain knowledge
solves for relationships that are not easily modeled / cannot be traditionally modeled
So, what would an AI solution look like to our above three tables?
AI for Prediction/Classification: Neural Networks
The models and formulas get replaced with a neural network:
Well, what is a neural network? Technically speaking, it’s just a bunch of matrices that are multiplied into each other — it isn’t exactly well-represented by the icons people usually use. Let’s look at our first weather-price example. Here, the input is a vector of [temperature, humidity, day of week] and the output is price. The vector of the input is multiplied into the matrices of the neural network, and the final value is a scalar of price.
What is actually happening in that icon? Well, there’s three things:
Input layer (gray)
Hidden layers (blue)
Output layer (green)
Mathematically, these are just vectors and matrices. To simplify lingo, we can use the term tensor to describe both of these. A scalar is a zeroth-order tensor, a vector is a first-order tensor, a 2D matrix is a second-order tensor, etc. So, mathematically, these are essentially just tensors multiplied into each other.
How do a group of tensors solve these complex relationships and accurately predict values? That’s a bit of the magic of AI, and I’ll start by saying that, yes, this actually works. We “train” the neural network, which essentially means fitting the values of the tensors such that when they’re multiplied into some input, the output is as close as possible to the expected value. When we hear words like “backprop” and “gradient descent”, they’re just describing how the tensors are updated during training.
There are many other more complex varieties of neural networks for more specific types of problems, but we now have the general idea of what a neural network is, why we’d use one, and how it anchors compared to traditional methods we may already be familiar with.
What about AI for Doing Things?
We can see how neural networks can learn to predict relationships between things on their own, but how do we make the leap to AI figuring out how to do and control things?
Let’s think conceptually about what “doing something” or “controlling something” means. Generally speaking, we could say that to do something or to control something optimally would be to do the right action at the right time, with “right” being defined by some task-specific score. If you can agree to that loose definition, then we can translate that to some more formal lingo:
“task-specific score” = reward
“right time” = state
“right action” = the action which produces the most reward in a state compared to the other actions
The terms (state, action, reward) constitute something called a Markov Decision Process (MDP). At risk of adding yet another acronym to the page, an MDP is essentially a process where taking some action in some state produces some reward (and brings us to some new state). It’s a really convenient way to mathematically map out many processes that humans (and AIs) have to make complex decisions to perform.
AI that does these types of things is the field of reinforcement learning.
AI for Optimal Control: Reinforcement Learning
Let’s look at an example of a task for optimal control. Say we have a metal ball that we’re trying to balance on a wooden surface. The metal ball is sent rolling from the left or right side of the surface, and we have to tilt the surface to prevent the ball from flying off one end.
We’ll say that the ball can either come in from the left side or the right side, and the surface can be tilted “up” 45° or “down” -45°. Let’s also say that if we keep the ball on the surface, we get 100 points, and if it falls off we get -100 points. How might we frame this for RL?
state: direction the ball is traveling (left or right)
actions: rotate 45°, rotate -45°
reward: +100 if ball stays on, -100 if ball falls off
You might be able to use your intuition about Physics to figure out what the right thing to do is before even experimenting, but that’s not how RL solves problems. It learns from experience, and “remembers” the right thing to do from that experience. We could put the possible states and actions in a table, where each cell holds the value of doing the action in that state. (Since we haven’t explored yet, we’ll say all actions are equal and worth 0.)
Now let’s say we try a few rounds of the experiment: we ultimately learn that when the ball is traveling to the right we should tilt the surface up (+45°), and we should do the opposite when the ball is traveling to the left (-45°). Our action-values are now:
This is a really simplified version of RL for a few reasons:
there is only one state feature (ball direction)
there are only two, discrete actions (up or down)
there is no sequence of states and actions (the episode ends after one state, one action, and one reward)
To transition into the more advanced flavors of RL, let’s make the problem a little more complex.
Let’s now say that the wooden surface is made of five parts, labeled one through five. Let’s also say that the ball is sent in from either the left or right side, and it has four speed levels: “stopped”, “slow”, “medium”, and “fast”. For tilting the surface, let’s say we can move it +/- 0°, 30° or 45°. Based on the last speed of the ball and the tilt of the surface, the speed will change (either increase or decrease). Our objective is to get the ball to reach the “stopped” speed in the center of the surface (part 3).
How might we frame this for RL?
state: segment the ball is on, ball direction, ball speed, rotation of the surface
actions: keep 0°, rotate 30°, rotate 45°, rotate -30°, rotate -45° (negative degrees here are clockwise)
reward: distance from center of wood, speed of ball (has to be “stopped” speed in the center)
The episode ends when the ball:
falls off the wood surface (failure)
reaches “stopped” speed on any part that is not the center part (3) (failure)
reaches “stopped” speed on the center part (3) (success)
Before we start experimenting, though, our RL still needs some way to store and remember the outcomes in each state for the action it took. Let’s use a table again, where each row represents one of the unique combinations of state:
there’s 5 segments of wood
there’s 2 directions the ball can be rolling in
there’s 4 speeds that the ball can have
there’s 5 possible rotations of the surface
This ends up being a much larger table compared to the example before, as there are now 5x2x4x5 = 200 rows instead of 2. Since each state has 5 actions, there are then 1,000 action-values in the table. Some sample rows are shown below:
Now, for modern computing, this is an extremely small table. What we did in this slightly more complex version was discretize the states and actions. Discretization puts what are normally “continuous” values into boxes of sorts. For example, speed is continuous, but here, we say that there are simply 4 speeds the ball can have. We could divide the wooden surface up into 5 segments, 500 segments, or 5,000 segments. And for rotations, we could use 1° or 0.5° degree increments.
What happens if we used much finer discretizations? Let’s say that we:
divide the wooden surface up into 1,000 pieces
track ball speed on a scale from 1-10 at 0.1 intervals (100 speed options)
still only have 2 ball directions
can rotate the surface in the range of +/- 45° at 1° increments (90 options)
This now means that our table of action-values would have 1000x100x2x90 = 18,000,000 = 18 million rows with 18M*90 actions = 1.62 billion action-values! Suddenly, the table is massive.
Deep Reinforcement Learning
While the table mentioned above is very large, it is still feasible for a computer to hold in-memory and utilize. There’s two things that push us past this point:
continuous state features that aren’t discretized
continuing to add state features
In our rolling ball problem, say that we now also had to consider wind (which could be blowing in either direction at different speeds) and changing friction of the surface (which affects rolling speed). We could try to add these to the table above, but there may not be logical discretizations, and the table could grow massively depending on the number of discretizations we add. If we do try to discretize the state features, we also lose some generality of the approach, as we have to set the discretizations ourselves with our own domain knowledge.
How could we do this without the table? Well, remember how we got rid of the table at the top of the page? We used a neural network to approximate a table, and we can do the same thing here for the RL’s action-values. This is what we call deep reinforcement learning. “Deep” because of the deep neural network replacing the table, and “reinforcement learning” because we’re learning how to optimally control something still, not just predicting/classifying some value. Let’s look at this visually:
We have a representation of a hypothetical deep neural network for the problem above. The inputs are all the state features we mentioned, and the output is the estimate of the value for taking a certain action in the specific state from the input.
Utilizing a deep neural network to approximate action-values is a highly general approach that lets us tackle problems with continuous state features and high dimensionality (since we no longer use a table). This specific type of architecture (deep RL) is what is enabling huge breakthroughs in the field right now, and almost all examples of RL in the second half of this page use deep RL.
Conclusion on Concepts
Here, we tried to introduce artificial intelligence at a very simple, conceptual level to hopefully provide some intuition about what AI solves, what deep learning / neural networks are, and what reinforcement learning / deep reinforcement learning are. Some aspects of reinforcement learning that we didn’t go into detail on are how an RL decides to try different actions and how it updates its values. If you’re interested in continuing your learning about these types of concepts, I have a slightly more technical page on RL here and a page for resources I used to learn RL here. My own demo RL project is linked below as the last example, and all code is open-source on Github.
With concepts out of the way, let’s move on to some examples of applied AI!
Examples of AI in the Real World
The first set of examples is related to deep learning, and the second set is related to reinforcement learning. Each example has a text link to the source page of the project, and the image preview also links to each project of clicked.
Deep Learning
AlphaFold
Task: AI solves a decades-long challenge of predicting the 3D model of protein structures
Implications: Rapidly accelerates new medicine discovery
Link: https://www.deepmind.com/research/highlighted-research/alphafold
Company: DeepMind
DALLE-2
Task: AI generates art in previously unimaginable ways
Implications: Unlimited artistic possibilities for the masses. An AI wins a human art fair (already happened).
Link: https://openai.com/dall-e-2/
Company: OpenAI
Wind Forecasting
Task: AI improves power commitments from a wind farm based on better AI-generated power generation forecasting
Implications: Improved unit economics and grid utilization of wind energy
Link: https://www.deepmind.com/blog/machine-learning-can-boost-the-value-of-wind-energy
Company: DeepMind
Reinforcement Learning
Tesla Autopilot & Full Self-Driving
Task: AI learns how to drive cars without rigid instructions and programming from humans
This is a combination of both deep learning for object recognition and deep RL for driving control
Implications: The most advanced self-driving system we’ve created (yet), robotaxis.
Link: https://www.tesla.com/autopilot
Company: Tesla
AlphaGo/AlphaZero
Task: AI masters the game of Go, which is vastly more complex than Chess. The same AI beats the world’s best chess computer after 4 hours of training.
Implications: AI demonstrates creativity and long temporal decisions in complex strategy games.
Link: https://www.deepmind.com/research/highlighted-research/alphago
Company: DeepMind
Server Cooling
Task: AI reduces server cooling costs by 40% while preserving server performance.
Implications: AI solves optimal control problems with significant improvements over traditional methods and domain experts.
Link: https://www.deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-by-40
Company: DeepMind
Nuclear Fusion Control
Task: AI controls plasma in a simulation of a nuclear fusion reaction to improve plasma stability in a real reactor.
Implications: AI solves extremely complex fluid dynamics problems, potentially enabling breakthroughs needed for new forms of energy.
Link: https://www.deepmind.com/blog/accelerating-fusion-science-through-learned-plasma-control
Company: DeepMind
GT Sophy
Task: AI learns to play Gran Turismo and beat the world’s best human racers.
Implications: AI demonstrates success in a complex racing environment with stochastic, unpredictable behavior from other drivers.
Link: https://www.gran-turismo.com/us/gran-turismo-sophy/
Company: Sony AI
Solar Panel Control
(This one is one of my own projects that I built to demonstrate AI for renewable energy)
Task: AI learns to control a solar panel to maximize energy without a model of the lighting environment.
Implications: AI scales problem-solving capabilities by learning from experience in model-free environments.
Link: https://www.jackogrady.me/reinforcement-learning-solar/research-summary
© 2022 Jack O’Grady. All rights reserved.