Chapter 7: Learning
This supplementary video to Chapter 7 of The Rise of Artificial Intelligence covers various aspects of learning, and provides two examples: (1) programming a robot, and (2) programming a racing car. The latter example is illustrated by a simple neural network that serves as the "brain" of the car. The general issue of "adaptability of models" is discussed and explained, and in particular, learning processes related to prediction and optimization models are presented in the context of the car distribution example.
Some of material in this video is based on a complex business problem that's used as a running example. The following article provides a full explanation of this problem as well as its complexities:
Michalewicz, Z., Schmidt, M., Michalewicz, M., and Chiriac, C., A Decision-Support System based on Computational Intelligence: A Case Study, IEEE Intelligent Systems, Vol.20 (4), July-August 2005, pp.44 – 49
Click here to download Chapters 1 & 2 of The Rise of Artificial Intelligence: Real-world applications for revenue and margin growth, and please contact us to request a soft copy of any other chapter of the book.Transcript (reading time: 18:45 min)
Hi, this is Zbigniew Michalewicz, I am one of the co-authors of The Rise of Artificial Intelligence, and this is a supplementary video to Chapter 7 of the book. The topic for today's presentation is learning. When we talk about learning, we mean several things: When we build prediction and optimization models, they have to learn; further, as new data comes in, these models are getting feedback on their performance, measuring outcome, and it will be very, very useful to incorporate this information in this continuous learning process.
As was the case for Chapters 4, 5 and 6, Chapter 7 is also based on the complex business problem of distributing cars, and we will again continue using it as a running example. The recommended reading is one short article, which is also available from the same website as this video. And the outline of this presentation is straightforward: we'll start with two basic examples: programming a robot, and programming a racing car.
Then we'll explain the inner workings of a simple neural network as the brain for a racing car. After that, we will talk about models and adaptability, we would return to car distribution example, and using this example, we will illustrate the learning process, whether it's for a predictive model or for optimization.
Let's start with the following question: how to program a robot so it walks? Now it seems from a high level there are two basic approaches: One is hard and the other is easy. The hard way would be to study the mechanics, weights, movement, balance, and surfaces – as well as many, many other issues – then develop a set of rules that the robot would follow and to walk. We can observe ourselves when we walk. We can try to reverse engineer our movement to arrive with a set of rules. But it will be very, very hard. The other approach would be just to train the robot to walk, teach the robot to walk so that the robot can learn.
The first approach would work as follows: after studying the mechanics and weights and movement and balances, we can arrive with the program which says: raise the left leg by 10 centimeters, move your torso slightly forward and to the right, and keep going. We analyze the weight of the torso and the power of the leg and the overall arrangement of the robot, and can we arrive at some set of rules that will result in some walk.
But the other way would be the AI way. So in this particular case, we can evolve a program which would control the robot. We can create 100 random programs, then track the performance of these programs, which would be as simple as how long this robot is capable of standing or moving on two legs before falling down. The longer time, the better the program. Then after checking all 100 programs, we select the best 5 programs.
Let's call them "parents." Then we modify these programs to generate, let's say, 95 new programs ("offsprings"). We replace the old 95 programs that were not selected as parents with offsprings, and we repeat this loop all over again. So let's have a look at what may happen: at generation zero, we have a robot with a random program. It's not impressive. Probably the robot survives a couple of seconds and that's it. But five generations later, we can see some traces of improvement.
The best program in the population was capable of controlling the robot, so it successfully completed a few steps. Generation 10 is even better. I mean, the best program in generation 10: the robot was wobbling a little bit, moving a few steps and falling down, but look at the best program at Generation 20. The robot learned how to walk, which simply means that the evolutionary process took over and the weaker programs were eliminated and the strongest program would be capable of controlling the robot – at least on a flat surface.
Nothing fancy, but the robot is walking – very, very confident walk. So let's look at very much the same approach for developing a program for self-driving cars. Again, the hard approach would be to study the acceleration, movement, turning, many other issues, or we can pay more attention to the way we drive and translate our behavior into a set of rules, possibly fuzzy rules. As we indicated a couple of videos ago, if the speed is high and the coming turn is sharp, then slow down, something like that, or a game that we can teach the car how to drive so the car will learn.
And in this experiment, let's assume that we have seven inputs coming to the car in very frequent regular intervals. We have five sensors: one sensor pointing straight forward, two sensors going a little bit to the left, and two additional sensors to the right, and additional two inputs – current speed and current direction. The brain of the car should process this information, all seven values: the distance to the end of the road in all five directions, as well as the current speed and current direction.
And the system should output a new speed and new direction. And later, these two output numbers are connected with the gas pedal or brake to get a new speed – higher or lower – and a new direction: the turn of wheel to go a little bit left, a little bit right, or straight forward. So if we stay with the program, then we can think how many rules can we really think of, how will these rules would interact with each other? And later, how can we analyze their performance.
Let's say we arrive at a set of some decent rules, but after a while and some particular courses the car would crash after 30 seconds or so. How do we know which rule was responsible? How do we know how to modify the rule? So even if we arrive at a set of rules, improving them, changing them might be not that straightforward.
In approach B, let's do the following: We would create 650 random programs, with the larger population the better. Then we'll check the performance of these programs and at this stage, we'll do it by visual inspection. We'll just look at their performance and select the cars that are "satisfying" from a performance point of view. It might be something very, very simple: It might be how long the car was able to drive before crashing into the wall. One way or the other, we should be able to evaluate the performance of all these cars.
Then we select the best five programs. We can call them "parents," in very much the same approach as for the robot. Then we modify these programs, their brains, to generate 645 new programs (we can call them "offspring"). Again, these offspring replace non-parents and resulting in a new population of 650 random programs. And we keep going: checking their performance, selecting the best few, modifying them, replacement, and again, and again and again – generation by generation, the artificial evolutionary process has taken off.
So let's have a look at the brain for each car. Each car will take seven inputs: Input from camera 1, camera 2, 3, 4 and 5. The current speed and the direction. And this information will be processed by a neural network, with the thicker line meaning a higher weight (because all these inputs will be mixed together in the next few levels), and finally the final two, the highest two nodes, would result in two values: One value would give us the new direction, which is later translated into movement of the steering wheel to stay left, right, how much, and so on, while the other is the new speed, whether to speed up or slow down or maintain the current speed.
So we have this neural network. The car would be on the road. We have these five yellow beams that would measure the distance to the end of the road – so this knowledge would be provided as input as well, as well as the direction and the current speed. And so what we do, we create 650 "brains" for 650 cars. Each car is controlled by a different neural network, and at this stage, we're ready to experiment.
Again, five inputs for five cameras, five beams: current speed, current direction, and that's all we're after. The system would result in adjustment: whether to turn left, right, speed up, slow down. So let's enjoy this experiment*. This is a single car. This is what we try to accomplish: We'd like to create a brain that would control the car so the sensors give us the distance to the road so the car would know where to go to avoid crashes.
So 650 cars, random programs, and we see that the first attempt was quite miserable. The cars didn't perform well, except this car we just highlighted. This car was successful by pure chance by making a left turn. So let's look at generation 2. There are some offspring of the smart parent of the car that was successful in taking that turn – this car is marked with yellow, it is here. Now, this is generation 2. And again, how to evaluate their performance, how to evaluate their fitness, is a little bit tricky: we can take something simple, we can take the number of seconds the car survived the race without crashing, we can look at some other characteristics, such as the car's ability to make turns, or its average speed. But if we look at generation number 3, the best cars are doing very well – there was a significant distance covered by the best cars.
And actually, in three generations, we have pretty decent performance at this stage. So evolution takes care of the rest. We just keep creating new generations of cars, and the brains are smarter and smarter because we take only the best parents to reproduce, to modify. And at this stage, the main goal is to stay alive as long as possible. So as we can see, the speed is not that impressive because speed is not included in the evaluation function.
On the other hand, once we have all the brains developed so the cars would avoid crashing, we can start paying more and more attention to speed. And in many, many generations, the fastest cars are really impressive: in a few seconds, they are capable of covering the whole course without any crashes. And this is without discovering any explicit rule. It's just through training, selection of the best, elimination of the weakest, and the evolution is taking care of the rest.
Now, let's talk about some real situations. Once we get some initial training data, very often we build a model and train it. We are operating in a static environment, still, it is possible to get some feedback loops, we have elements of learning: we can take the first subset of available data, we can train the model, then we can check the performance of this model on another set of data. So there is some kind of feedback, there is some kind of adjustment, there is some type of learning. But everything happens in a static environment – whatever happens is based on an initial set of data, historical data. The moment comes when we have to deploy the models or model goes live and suddenly the model starts to perform. At this point, a few things are happening: apart from the initial set of data, we are getting a new set of data coming in regular intervals. For example, every quarter we are getting new sales results for the last number of weeks.
Also, if the system is making recommendations, very often we can get feedback on recommendations, feedback on performance, whether the recommendation had any merit, or the recommendation was accepted or rejected, and so on. So learning is coming from three directions: The initial learning is happening when we train a model, then one type of learning might be connected with new datasets arriving at regular intervals, and another type of learning may be connected with getting direct feedback. So there are many things we can do at this stage. We can rebuild the model if we notice that the performance of the model is not acceptable in the live environment. This may mean that we made some mistakes, some key variable was not included, or the amount of initial data wasn't sufficient. One way or the other, it's necessary to rebuild the model or in some other cases, it might be sufficient just to retrain the model. If the problem is handled well by the predictive model, let's say, but there is a significant number of outliers. It seems that the model is inaccurate on very unique special cases. Then some retraining might be desirable, and very often some small adjustment of the model would be sufficient. So all basic rules are right and we're just adjusting or updating the model to include the newest set of data. But whatever change we make, whatever modification is applied to, whether a predictive model or optimisation model, we call it adaptability. And this will be one of the requirements of most systems: to be adaptive. So the system should be able to learn and change with respect to changes in the environment. Let's return to our car distribution example. Again, a very quick summary: GMAC leasing company, part of General Motors, is getting back a significant number of cars after the termination of leases and rentals. This number is around 1.2 million cars, and this translates into between four and seven thousand off-lease cars being returned and consequently distributed every single business day.
And the team in Detroit of 23 analysts are making daily decisions on where to send these cars to maximize the resale value. And we talked already about the prediction issues, we already talked about optimization issues, but there are also learning issues because the problem is set in a dynamic environment. Petrol prices go up and most likely prices of sport utility vehicles would go the other way. Then suddenly these different colors are popular in different states and new makes and models enter the market.
Many, many things are happening, and the environment today usually is very different than it was three years ago, so ideally, we should create a system that can react to changes in the environment. We talked about this problem over the past few presentations, about distribution centers and the different auctions that are placed in different states. We have some changes in demographics around auction sites. We have different demands because of new makes and models, thus the popularity of different cars can change in a significant way.
We can look at this map as a representation of a dynamic environment. And we also talked about this interaction between the optimizer and predictive model. The optimizer is searching through a vast number of possible distributions, and the predictive model is evaluating all these distributions saying "Hey, this distribution would provide an average lift of $313 per car." And we underline this mechanics of the optimizer, as this loop of proposing a new distribution and evaluating this new distribution is repeated many, many times.
And this is really a daily activity: every single day the GMAC team is running the system, and in seven, eight minutes the system would return the recommended distribution, and the final outcome is an average lift per car of $245. Every day the system would get a fresh input between 4,000 and 7,000 cars to be distributed and this is their daily activity. However, also every Monday morning, the system is getting additional files. The system is getting direct feedback on performance. The system is getting a file of all cars that were sold during the last seven days from last Monday to today, and for each car we have the date of sale and the auction location. There is also the predicted price and the actual price, so actually the system has an opportunity to compare predicted values with actual values and to make adjustments if necessary. So this is the learning component: We look at what happened, we know what we predicted, we know what the actual values are, and then we can start reasoning. We can look at all these deltas. We can look at means and averages and deviation, wondering whether some discrepancies are justified – it just happened – or they signal a trend.
And so learning is also key. Without this feature, probably the system would be obsolete six months after it went live. So apart from prediction and optimization, the learning component was also of major significance. And we talk about predictive model in the context of the car distribution system, which was an ensemble stacking model: one model was making the base prediction, many other models were making adjustments, which we covered in Chapter 5.
So now when the time comes for updates based on direct feedback, we may update the base model. It may happen that Toyota Corollas are not that popular any longer and there is a very clear trend – possibly because of a different model or model of a competitor – less and less people are interested in Toyota Corollas. So the base model should be updated accordingly, as there is a new trend in the environment. Then we have to update the model by making some adjustments based on mileage, based on color, based on season, based on a variety of things, because simply, again, it's a different environment.
It may be because of gas prices. Some adjustments would look different. And on top of everything, we have to update this meta-model – this neural network – which synchronizes the base prediction and adjustment, which makes the final prediction. It may be necessary to tune the weights of this neural network a little bit further. And again, it is done in an automatic way. There is also learning connected with optimization. One of the early experiments indicated that different optimization methods – such as ant systems and evolutionary strategy, evolutionary programming, genetic algorithm, simulated annealing, and so on – give different performance, and the reason is the that not every day is the same as the other day.
It might be that on day one majority, 80% of cars were returned around Washington and Boston area. It may be that some big government department was returning their cars, the lease is terminated on the same day and a significant number of cars is returned in this particular location. So, we are dealing with a situation when this distribution center around Washington, D.C. may be overloaded and other distribution centers may have a minimum number of cars.
How to distribute these cars? As distributing these cars might be a very different exercise than some other (average) day when every distribution center gets more or less the same number of cars, and there is a variety of different cars. Also, on day 5, maybe one agency returned 300 white Pontiacs at one single location. What should we do with cases like that? And different optimization techniques have different characteristics. They may deal better or worse with different instances of the problem.
So the problem is very much the same: just distribute the cars, but the starting instance may be very, very different from uniform distribution through cases where the concentration of most cars is in one location. Then we observe that different algorithms would give different performance: genetic algorithm was the best algorithm for day number one, but on the other hand, ant system was the best algorithm for day 3. What we can do in a situation like that? Actually, we can do a lot. We can learn, and learning is the following: we can analyze car distribution for each day, and we can pay attention: which algorithm was the best algorithm for a particular instance of the problem. And then we can train a neural network: we can look at the number of cars and the number to be distributed, and the number of auctions which are open within the next two weeks, and we have some distribution indices to give us information on how these cars are distributed across the nation.
Also, it is very important to mention constraints. We have some constraints index. We may have so many constraints and so many business rules, then we can hardly have any choice where to send the car. If we say, let's say maximum distance for transportation is 200 miles, basically we are directing cars to the closest auction site. So even very primitive optimization algorithms would do very well because the number of choices is quite limited. If we relax constraints and the feasible search space is much, much larger, then we would need some sophistication for optimization.
So with this knowledge of all constraints the size of feasible search space and its ratio with respect to the whole search space and information on distribution, and the number of cars and so on, we can train a neural network, which for a new case, a new instance, would recommend the best algorithm, which would take responsibility for optimizing this particular instance. So to conclude, learning is also a key component in the whole operation. Going back to our pyramid, when we started with the problem, data, information, knowledge, we talk about prediction, optimization, now we are making decisions, but these decisions are supported by learning components and decisions are made in a dynamic environment, and this is extremely important to keep it in mind. So to summarize, the predictive model is key, and this was the content of Chapter 5 and presentation connected with Chapter 5. Without a good predictive model, the optimizer would have an impossible task. Without accurate prediction, the system wouldn't provide much value. Optimization is also key – this is what we covered in the presentation related to Chapter 6 – because without a clever way of searching through an enormous number of possible distributions, it would take simply too long to find an optimum solution.
And learning component is key, because problems are set in dynamic environments. And this concludes the presentation for Chapter 7. Thank you.