Chapter 5: Prediction
This supplementary video to Chapter 5 of The Rise of Artificial Intelligence discusses a few key issues related to predictive modeling. An overview of AI-based prediction models is provided together with a demo of a particular simulation that can be used as a predictive model. The video concludes with an overview of ensemble models for the ongoing example of distributing cars to auction sites.
Some of material in this video is based on a complex business problem that's used as a running example. The following article provides a full explanation of this problem as well as its complexities:
Michalewicz, Z., Schmidt, M., Michalewicz, M., and Chiriac, C., A Decision-Support System based on Computational Intelligence: A Case Study, IEEE Intelligent Systems, Vol.20 (4), July-August 2005, pp.44 – 49
Click here to download Chapters 1 & 2 of The Rise of Artificial Intelligence: Real-world applications for revenue and margin growth, and please contact us to request a soft copy of any other chapter of the book.
Transcript (reading time: 18:51 min)
Hi, this is Zbigniew Michalewicz, I'm one of the co-authors of The Rise of Artificial Intelligence, and this is a supplementary video to Chapter 5 of the book. The topic of this presentation is prediction. Some of the material presented in this chapter – like the previous one and the next one – is based on the complex business problem of distributing cars that I will use as a running example. And the recommending reading is the article which is displayed here, which is available from the same website as this video.
The outline of this talk is straightforward: We'll start with general remarks on predictive modeling, talk a little bit about the granularity of predictive models and then about Artificial Intelligence-based predictive models. Then we move to simulation, we will talk about agent-based systems and simulation as a predictive model, and conclude by returning to the car distribution example – and in particular, we'll pay close attention to ensemble predictive models for this car distribution problem. Predictive modeling from high-level level perspective can be visualized in the following way: The central question is "What will happen next?"
And we answer this question based on some historical data. So, for example, we have some sales values for a product in some part of the country. And we're looking at some number of weeks or months back and we have to predict what will happen next: how much we will sell next week, and the following week, and so on? We can build a variety of predictive models for this problem. These models may be linear, quadratic, or exponential – this is not important. What is important is that each curve represents some prediction: what would happen in the future? We are making bets that the distribution of dots or sales values will follow a particular curve. And it may happen that this doesn't hold true, that these models were wrong to some extent, and the longer the prediction horizon, the larger the magnitude of errors. But if we look at this question: what will happen next? We shouldn't think in terms of just historical data.
When we are making all these predictions, very often, we have additional datasets that can be used to make the predictive model much better, much more stable, much more accurate. We may have patterns from the last few years. So this set of historical data can be very often extended by several years back, which will be very, very useful because then it will be easier to study seasonality. It will be easier to study special events like Easter, Christmas, school holidays, whatever they might be.
Weather patterns may also be very important. We may have available data from the Bureau of Meteorology for the weather around that time. Then we may have information about new products that were introduced at that time, and these new products may have influenced sales of the product in question. Or we may have information about past and current promotions, which is very important. These promotions might explain these movements of dots up and down: there was a promotion, there wasn't a promotion.
By having this information and knowing which promotions are planned for the future, it would be easier to predict the future with a higher degree of accuracy. Or we may have information about competitors and their products, which might be also very, very useful. And if we combine all these data sources and build a predictive model very carefully, then we can get some meaningful results. The prediction of our model might be quite remarkable. One additional observation is the following: any problem-solving activity – and in particular a problem of prediction – is a two-step process: we first build a model of the problem and then we use this model to make a prediction. So this first step, model building, is of the greatest importance, and clearly, if the model is wrong, then the prediction is wrong, so we should pay special attention to this first step, how to look at the problem, how to extract the most important information, and how to combine this information, knowledge, and data into a model that will perform well.
So building a predictive model is quite challenging. Of course, we should have historical data available. And as I have just indicated, this historical data is not only for the variable in question – like sales volume in the future – but all historical data, like seasonality and weather and promotions and competitors and new products; everything which is relevant, which is available. Then we should identify the key variables. And some people may think the more variables, the better. But that needn't be the case: if the model is too complex, very often the performance would be very, very poor for a variety of reasons.
The next step would be the extraction of useful knowledge from data trends, seasonality, dependencies between variables, and so on, and then selection of an appropriate model, whether it will be a neural network, fuzzy rules, or even a simple statistical model. But sooner or later, we have to make a decision: which model would be the most appropriate for the problem at hand, before training, validation and testing of the model.
And on top of everything, we should have a mechanism for regular updates because very often we operate in dynamic environments. We are getting feedback, we are getting new data coming in and we have to have a method: what to do about this, how to include feedback or arrival of new data to update the model so it stays current. Again, if the model is wrong, then the answer is wrong. This is why it's so important to take very good care with this modeling effort. A few comments about the granularity of the model: Let's assume that we are modeling some transportation effort connected with delivery from warehouses to customers and so on.
And clearly, we can look at a map of Australia. This is for air travel, so it will be perfect. We have cities, we have a destination, you can estimate time, you can extend this data by timetables of flights and so on. But we would like to also deliver by trucks or trains, and in that case it's a necessary to go a little bit deeper and identify major highways or secondary roads and get additional information about major cities and distribution of their neighborhoods.
And we can keep going deeper and deeper, taking care of intersections and information about peak hours and the average times of getting from one point to another all the way down to traffic lights and one-way streets and traffic densities and so on. So if we're trying to build a very accurate model, we might think need all this information, thousands of variables, if not more, and every street and traffic light and peak hours and so on.
And the question is where to stop? How deep we should go to solve the problem? And there are no easy answers for that. Everything is based on the problem: how will we apply this predictive model? What is an accepted accuracy of the model? Are we predicting delivery times in five minute time windows or is a range of three hours sufficient? Perhaps just identifying the day of delivery would be okay?
Also, how will we update this model? If we go to this deep level, it will be very difficult to maintain this predictive model because any road work, road block, accident, and so on should be immediately introduced into the model, and this simply isn't feasible. How often will the model be used? And so on. There are many, many questions. So this decision of granularity relies on experience and expertise and is very often part of the "art" rather than the "science."
There are many Artificial Intelligence-based predictive models. One of the most popular is neural networks. Most of people have heard the term and here we have a network of nodes and connections, where we are presenting some input. This is the lowest level of the neural network. Let's say we provide the product ID and the sales of the last week, and the promotion period and seasonality and competitor behaviour or whatever it may be. And this information is processed from one level to another, and the information is passed further with different weights, and the output would be, for example, the best price and the ideal promotion period.
We can also use agent-based system as a predictive model. We talked a little bit about this in the last video, and the general idea is that we can model a variety of variables and interactions of these variables to observe an "emergent behavior," which is a sort of prediction. If the variables interact in this and that way, then this is the most likely outcome. We'll have a look at such simulations in a few seconds.
Fuzzy logic is a very powerful tool that can serve as a predictive model here. The trick is that we can describe the problem as a set of rules, but the rules are expressed in fuzzy terms: high, low or medium, sharp, not sharp, high speed, low speed, medium speed, and so on. And this is how human being reason: If we're driving too fast and approach a sharp turn, then we'll slow down rather than speed up. And if our speed is 70 km an hour and the terrain is 35 degrees, then we'll reduce the speed by 25 percent. There are some mechanisms, the fuzzy mechanism of processing information in a way where at the end we get a very precise recommendation, or a precise prediction.
We can also use random forest models. A decision tree is one of the most popular modeling tools, and all of us use decision tree one way or the other. If this happens, then turn left, if something else happens, turn right – and then after turning left or right, if this happens, then do something else, and so on. Very often, applications for getting credit in banks are organized as a decision tree. If the number of years of employment is less than 10, follow one branch. If it's more than 10 years, follow another branch. And then if anyone's salary is lower or higher than such and such number, we follow appropriate the branches until we arrive at the final recommendation. Random forest combines many decision trees into one model to provide much better stability, efficiency, and accuracy.
We can use genetic programming, which is similar to random forest. But the difference is that we're not dealing with trees, but with computer programs. So imagine that we create 100 computer programs randomly and these programs are responsible for giving us a prediction of some sort. And we evaluate these predictive models and historical data. Some of them predict better; some a bit worse. And we run an artificial evolutionary process.
We select the better programs by doing some genetic tricks, such as crossovers and mutation. We generate offspring, new programs, replace the weaker programs and we run it generation by generation. We will talk about evolutionary programming in the next supplementary video for Chapter 6. But this is the general idea. We evolve the best predictive model rather than build it from scratch. Also, we can take several models and combine them together into an ensemble model, and this ensemble model will consist of a few models that, for example, vote or calculate the average of the recommendations.
So by playing with the different aspects of the problem, this ensemble approach may give us some very interesting, very good results. We'll return to ensemble models towards the end of this presentation. Let's now have a look at particular simulation that can be used as predictive model. We are trying to predict what would happen to the traffic during peak hours during particular circumstances. We have several variables like the rate of inflow of cars, the ramp flow, and even the politeness of drivers, because if it's very hot, they may be a little bit nervous, less polite, and so on.
So we start the simulation and watch the movement of cars, and suddenly we ask: what would happen if the inflow of cars is much higher, and the ramp flow is also much higher? We're approaching peak hour, more and more people are leaving work, and it's also very hot. So politeness goes down. We also increase the percentage of trucks entering the highway, and the top of that, we can introduce an accident blocking one lane and then observe what is happening, what would be the outcome of these circumstances. And we can look at this simulation as a predictive model. We can understand the delays and we can predict delays; we can do a variety of things. So let's return to the car distribution example. I introduced this problem last time, but this is short repetition: GMAC Leasing Company, part of General Motors, has many cars are coming back after completion of leases and rentals.
This number is around 1.2 million cars being returned annually, and each day a team of 23 analysts are making decisions on which auctions site to send all these cars. So for each individual car, they have to make a decision: From this distribution center, this particular car should be shipped to this and that location. And to do it in a meaningful way, we have to have a predictive model. We have to know that if we send this particular car to this particular location, then the most likely price is, let's say, $12,373. Without this information, without any predictive model, we would be making decisions in the dark.
And there are many prediction issues. First of all, we have many variables to consider: Car-related variables and auction-related variables like make, model, odometer reading, year of production, or location of the auction site and its neighbourhood demographic, other information around this auction site, and possibly weather patterns, changes in market conditions, and seasonality. For example, if we send convertibles to Chicago in winter then we'll be regretting it for the next few months.
Also, we have to take into account the "volume effect" because if we send too many similar cars to a single location, then we'll depress the price. So this is the overall situation: We have several distribution centers, we have several hundred cars sitting at each distribution center, and every single day, the team in Detroit is making up to 7,000 individual decisions. This car from this particular distribution center should be sent to this particular auction site, car by car, distribution center by distribution center, keeping all this information in their heads: the price of each car, if it's sent it here, it would sell for such and such. But if it's sent over there, the price will be slightly different, not to mention transportation costs and other variables, which the optimizer will be dealing with.
The predictive model is still a key for the following reasons: First of all, if the predictive model is wrong, there is nothing to optimize: Any decision we can make might look OK, and can concentrate on minimizing transport costs because everything else is like a random guess, and the price prediction would involve some key variables.
Of course, the make, model, year, and odometer reading would be absolutely key variables for any car. Then we have other variables like the VIN number, colour features, damage level, and so on. We can buy VIN Decoder, which would give us additional information on each car, which is why the VIN is listed here. We have to know about the distribution of other cars that would be sold together with the car in question, because we have to know the volume effect. We have to know the date and auction site because we're making a prediction for the future and the date of auction may be important.
We have to know the current inventory information for this particular auction. And of course, the seasonality, predicted weather, and so on. We may also have access to external data, such as petrol prices and current trends. Suddenly, the color yellow is very popular in California – a couple of thousand dollars premium on yellow Corvettes. And the optimizer interacts with the predictive model all the time. It works like this: The optimizer is trying to find the best possible distribution by asking the predictive model the value of each possible distribution. And the predictive model, let's say, would respond: this particular distribution would give you an average lift of $213 per car. By the way, we need some benchmark, we need some number to evaluate the quality of the distribution. Here, average lift was selected as this measure and the benchmark for comparison – we are measuring against the distribution where we send cars to the closest auction site, which is what happened in the past very, very often.
So with respect to the benchmark distribution, this new distribution proposed by the optimizer is better by $213 per car. But the optimizer is trying to find the best distribution. So the optimizer would modify this distribution a little bit and then ask the predictive model: what is the value of this particular distribution? And the predictive model would say: in this case, the average lift is a little bit lower, $209 per car. And the optimizer would then generate another distribution and again ask predictive model: what about that? And the predictive models may say this is very good, this is $213 per car.
This process of interaction between the optimizer and the predictive model, which serves as an evaluation function, is repeated many, many, many times, possibly a million times, if we have enough time to run the optimizer through millions of iterations. And this predictive model, which is of the greatest importance, was based on an ensemble model, with the general idea as follows: We provide the VIN number of the car for which we'd like to get a price prediction, but this is extended by additional input variables. So apart from all other variables for this particular car, we are looking at the distribution of other cars that are heading towards one particular auction site and we know the date of the auction at this auction site, we know also the current inventories at all auction sites, including cars in transit. And then we process all this information.
Many different models evaluate the different aspects of this problem with different weights. And finally, the system will converge to the answer. If we take this car and we ship it to a different that auction site, we will get $24,507 – which is very, very important information for the optimizer to make the optimal distribution. And to build this ensemble model, we split the training data into two subsets: one and two.
The first subset was used to train a variety of models, and the second subset was used to model and tune their prediction, because on the second data subset, we know all the historical values, we know all the prices these cars were sold at. So we are using these predictions to tune the system to arrive at one meta model, which, by the way, was a neural network. So initially, when we start with this subset, number one, we train a variety of models, we are talking about 12 or 13 different models. We create some basic model for the make and model. And the model is making these predictions for one year old cars and then the remaining models are making adjustments for volume effect, for mileage, for the year, seasonality factors, and so on , with all these models working together to make the final prediction. So model one would give us the base prediction and all the other models would give some adjustment. And then we are training this meta-model, this neural network, to come up with the final price prediction.
So when we are considering a new case – a particular car for which we would like to get a price prediction – then we look at all attributes of this car, the distribution of other cars and auction dates, and where we plan to send this car, the inventory levels there, everything. And let's say the base prediction is $12,500 and we make a variety of adjustments and then the neural network would tell us: the final prediction is $11,250.
With this explanation, let's conclude by looking at particular demo of the car distribution system. And in this demo screen we can switch to the distribution tab, and see thousands of cars listed in this file. These are cars which are ready for shipment. They are sitting at different distribution center locations. And we have only a handful of variables that are displayed in this demo: make, model, year, and trim level.
And needless to say, there are many, many additional variables which are not displayed here. And of course, the question is which auction sites these cars should be directed to? If we switch to visualization and we start the optimizer, the optimizer would look at these millions of different possible distributions, and for each distribution, ask the predictive model: if we do it this way, what is the quality measure of this proposed distribution? So let's stop this for a second and let's return to the distribution screen.
And you can see this particular column is already filled in. So we look at a particular car, Honda Accord, trim level, year of production, 2003, and the recommendation of the system is that this car should be sent to Birmingham in Alabama. And if we do it together with other cars, we will get $15,302. This is the role of the predictive model: without these values, the optimizer can't do a decent job. This piece of information is essential.
So this is how it works– cooperation between optimization and predictive model. And during the next video for Chapter 6, we will talk about optimization. So you will see how these components work together, what the optimizer is actually doing, in which way the new distributions are generated. Thank you.