Chapter 3: Promotional Planning and Pricing
This supplementary video to Chapter 3 of The Rise of Artificial Intelligence discusses the complex business problem of promotional planning & pricing from the perspective of digitalisation, prediction & optimisation, and trade-off analysis, as well as from the broader view of optimising across multiple business functions or silos.
Click here to download Chapters 1 & 2 of The Rise of Artificial Intelligence: Real-world applications for revenue and margin growth, and please contact us to request a soft copy of any other chapter of the book.Transcript (reading time: 16 min)
Hi, this is Matt Michalewicz, I'm one of the co-authors of The Rise of Artificial Intelligence, and this is a supplementary video to Chapter 3 of the book. In this video we'll recap the Problem to Decision pyramid. And we'll discuss digitalisation, production, optimization and trade offs from the context of promotional planning. We'll also provide some additional thoughts and closing commentary. Now, recall from Chapter two that the Problem to Decision pyramid begins at its foundation with a clear definition of the problem that we're trying to solve.
And then each additional layer: Data, Information, Knowledge and so on allows us to unlock additional value through improved decision making. But it all begins with a clear definition of the Problem. So in the case of promotional planning and pricing, we can describe the problem as a question being "What should we promote, when and at what price?" And this seems like a very simplistic type of question to ask. But if we are a retailer of alcoholic products and we want to answer this question, it's actually quite difficult and requires a lot of analysis.
Imagine that we are in charge of spirits and we want to put a particular vodka on promotion in a particular state at a particular point in time for 20% off. We would have to predict, first of all, how much more of that product will sell in that geography in that time period at that price (called promotional lift or discount elasticity). But we would also have to predict cannibalization or cross elasticity. If we place one product on promotion, then many customers will switch from the product they regularly buy to the one on promotion because it's cheaper and they view it as a substitutable product.
So we have to accurately predict the decline in volume in the other products in that category. And lastly, as a retailer, we would be interested in predicting basket sizes or how does particular promotions affect basket compositions? Is vodka a good item to promote? When we promote vodka, do people come in and buy along with it chips, Red Bull, a Chardonnay, or do they just come in and buy a case of the product from promotion and then leave, thereby benefiting from the reduced price? And for us as a retailer, it not being a very good promotion because we've cannibalized future margin. So each particular decision that we contemplate in answering this question requires a significant amount of either prediction capability or a great deal of domain knowledge to answer whether that particular promotion is good or not. Building an application or deploying an application with advanced capabilities begins with a digitalization process. It is the removal of all of the spreadsheets that are typically inherent in promotional planning workflows, spreadsheets that master certain data files or product descriptions or product codes all the way to spreadsheets of what historical promotions were run and what the results were achieved, and the manual nature of assembling all of these spreadsheets together.
Tweaking them, analyzing them to create a future promotional plan is the first step in digitalization; the removal of all of this and the implementation of a digital environment which represents one version of the truth, in this case, a digital slotting board where we can slot promotions, the output of which is catalogs, advertisements that show products being advertised at particular points in time and prices. Now, if we switch over to an application for promotional planning, then we have here an electronic slotting board that represents the promotional periods, P1, P2, P3, P4 - these are two week time buckets.
But as we discussed in Chapter 3, these could be weekly, these could be monthly depending on the environment. We also have a category of products. In this case we were in Beer, but we can switch it to Wine and once we are in the wine category, we can then select either a particular geography, such as, for example, NSW or all of Australia. We can also select particular suppliers. We can deselect everyone and then pick up, for example, Pernod Ricard, seller of fine wines and have all of their products listed, and what type of promotions at what prices we might choose to run.
Notice that in this in environment of a digital slotting board, we are tasked with manually going into promotional periods and setting up promotions which might be national or they might be state-based. For example, here I've set up a promotion for NSW, for this particular Jacob's Creek product, I would have to associate this with a particular promotion type or mechanic, whether it is a discount in catalog or in-store promotion. And I would have to assemble this promotion as part of the overall promotion run during P4.
This represents digitalization. I've moved away from spreadsheets. I've moved away from manual types of loading of data, unloading of data to one version of the truth, where everything happens in one environment, within this environment as the next level of the pyramid beyond Data. We also have access to reporting. For example, we can click on this particular product and understand what kind of distribution it has in each particular state, meaning how many stores actually stock this particular product within the network.
So if I promote it, will I have good enough distribution in the store network to make the promotion worthwhile? I can also access reporting on past promotions through dashboards, how they performed. This is dummy data, obviously, but in terms of what was the business performance of promotions that were run in the past, this all represents reporting or information. If we go back to PowerPoint, then the next level of the pyramid represents Knowledge and I use the example in video 2, when I talked about overlaying various datasets together to and to answer the question of why something happened. So just to recap, we have sales data here of a particular product. We have a lot of variability in the sales of this product. And if we overlay this data set with an external data set, which is temperature, we all of a sudden see a correlating relationship between falls in temperature and falls in demand for this particular product, which is further reinforced with rainfall.
So the product sells well when it's hot and dry and sells very poorly when it's cold and wet. This is a form of knowledge. It goes beyond information and reporting because it explains why things happened. It also forms the basis of what we do next, which is predicting what will happen. Once we have a better knowledge or understanding of why things happened, we have a better ability to build accurate prediction models or tune prediction models that give us more accurate outcomes because we are able to consider all of the variables and all of the factors that drove a certain type of outcome and are likely to drive a certain outcome in the future.
So note that in the case of not only promotional planning, but really any complex business problem, we take data, we analyze it and we create insight on the layers of the pyramid. This would represent knowledge and this analysis of rainfall and temperature and sales of this product is a form of insight. Yet insights represent the past. They represent what happened. And even though they might be interesting, it's very difficult to generate value from insights unless we make some kind of decision or base some kind of decision on these insights.
And there's a gap here between an insight representing the past, and a decision representing the future. And these are the top bits of our pyramid. We have to predict what is likely to happen under certain scenarios or certain decisions that we might make. And we should optimize our decision for whatever our objective or KPI is, volume, margin and so on. And once a decision is made, the actual outcome should be fed back into the application so that the underlying models and algorithms can learn.
Just like the human brain learns from experience - we think something's going to happen, we make a decision, then we see what actually happens and we learn from that experience - systems can do the exact same as well, as long as there is a feedback loop providing them with actual outcomes. So if we switch back to the application and we go back to our planning environment, whatever kind of promotional plan we set up here, we can save and we can call it whatever we like.
We can call it promo plan 21. And then we can go to the predictive part of the application, pick up the promo plan that we've manually created, and then we can ask an AI engine in this case, Larry, the Digital Analyst, to predict the market outcome of this plan that we've created. So this is the clear separation between digitalization and manual planning and scheduling, reporting, knowledge, and then making the jump to prediction - a system actually telling us what will happen if we make a certain decision.
So you can see here predicted performance, volume growth on last year. That's one metric. We could have another metric such as net sales revenue outcome, which is dollars. We could look at it by a particular state and all of a sudden we have predicted values. And this is the difference between, as I said before, information, knowledge and actually having an insight or an answer for what will happen if we make a decision. Now, switching again, back to PowerPoint: Prediction (and we have an entire chapter and video dedicated to this topic, Chapter 5), is a notoriously difficult subject because the way it is approached by most models or software systems is that they take a historical data set, which could be historical sales. They overlay some classical mathematical models to this data set. And whatever model best fits the historical data becomes the prediction, becomes the forecast for the future.
And then what invariably happens is the future is different to the predictions made by these models. And we incur prediction error, forecast error, and things don't happen the way we predicted that they would. One way to overcome this, which is also discussed in Chapter 5 and also in Chapter 7 is through an approach of combining internal and external data together to improve prediction accuracy so we could overlay whether we could overlay census data to have the catchment areas for each retail store. We could have the socioeconomic demographics of each particular suburb that stores serve. We could have competitive pricing, whatever is available externally, being used by the model to improve its accuracy. We can then run multiple prediction methods in parallel: statistical methods, neural networks, fuzzy logic and so on, and then have a voting and selection mechanism that comes up with the final prediction.
So in effect, we're trying to improve the accuracy of predictions made because we know that the optimization, most importantly, the decision we make will be based on the predictions that are made. So if we have inaccurate predictions, we're more likely to make bad decisions. If we have highly accurate predictions, then we're more likely to make a good decision. Going back to the software application, going from prediction to optimization is probably one of the largest jumps that can be made, because even if we have an accurate predictive model, there is still a very manual process of creating these new scenarios and then predicting their performance.
For example, we could create another plan and by setting up some kind of promotion for a different product here in South Australia, and then we could save this new plan, "promo plan 22". And then we could go to predict performance and we could pick up "promo Plan 21" and "promo Plan 22" and now predict their performance together side by side to have an understanding of which is the better promotional plan. But imagine trying to go through thousands of combinations in this way, not just having 21 or 22 plans, but having 10,000 plans or even 1,000 plans.
It becomes extremely difficult and you lose yourself in the details and the differences between different plans. So this is where optimization comes into effect. Instead of manually building these plans one by one, we can take a plan and it could be completely blank or it could be last year's plan, or it could be our best attempt at making a promotional plan. And then we can select an objective like volume or net sales revenue. So this within optimization is the objective.
And this is an example of single objective optimization, which we discussed in Chapters 2 and 3. And we'll talk about multi-object optimization next. Also recall that the business rules and constraints form a very important part of solving and addressing any complex business problem. We're interested in feasible solutions, feasible plans, executable plans, plans that we can run in the marketplace without violating any type of commercial agreement or any type of business rule that we might have.
So these business rules and constraints can be anything we want them to be. They're driven by individual organizations, by the strategies of those organizations for certain market segments or customers or products. But the important thing is to be able to dynamically influence these constraints, to be able to go in and change something like minimum customer margin to 11% or growth over last year, and to save these dynamic constraints along with whatever the objective is, and to then run an optimization process that will consider the billions and billions of possible different types of plans, we might have come up on our own, and predict their performance and arrive at a plan that maximizes performance while satisfying our business rules and constraints.
The actual process of the optimization algorithms will be covered in detail in Chapter 6. So I'll skip over in this video. I'll also interrupt the optimization process to just get to this point, which is the result. Not only is the application providing us a predicted value for the plan, but it's also showing us what changes have been made vs. the plan that we set up to begin with. So we can see changes in each promotional period that the application has made to achieve a better financial result.
It is trying to save the use of the time of manually creating plans - predicting performance, tweaking those plans - predicting performance, tweaking them further - predicting performance, and just having the user tell the application what the objective is, what the constraints are, and then have clever AI-based algorithms search through the search space and arrive at optimized solutions that are fed back as plans with a side-by-side comparison. It's important to also note (which throughout the book we've highlighted) that it's rare that in business settings we only have a single objective.
Usually there are a couple of objectives and unfortunately they trade off with one another so we could pick volume, but we're also interested in net sales revenue. And in this particular case, the system creates a trade off, a Pareto front of optimized plans or solutions. I'll interrupt the optimization process just to point out that each one of these trade offs represents a separate plan in itself with various changes that are being introduced. And it's now up to human experts or domain experts to pick where they would like to be on the Pareto front in terms of the trade off between one objective and another.
If we go back to the slide deck for a second and talk about optimization methods, just like there is a lot of benefit in running a set of prediction methods together to get a more accurate prediction, there's also a lot of benefit to running multiple optimization methods in parallel, or optimization algorithms, to get a better overall result. We can look at some of these optimization techniques here, ask the question which one of them is the best for any optimization problem? And we can confidently answer that none of these techniques is the best because as we've described in later chapters of the book, The Rise of Artificial Intelligence, the instance of the problem changes from geography to retailer to product. And on these various instances, various techniques might perform better or worse. We also showed these optimization results in Chapter 6, which show twelve instances of the promotional planning problem and the application of eight different algorithms. And what's interesting in this particular experiment is that on each instance of the promotional planning problem, there was a clear algorithmic winner that was able to achieve the highest volume, which was a single objective optimization run.
If we had implemented a single algorithm, then our performance from instance, to instance of the promotional planning problem would have been subpar or suboptimal in comparison to running multiple techniques in parallel. Hence, there's a lot of value to a hybrid approach of prediction and optimization algorithms. If we switch back to the application, a few additional points are worth making. One, it's important for the system to have a feedback loop, which is capturing actuals from promotions.
Not only is the application predicting what will happen, but it needs to know what did happen to self-calibrate the weightings inside of the models so that it can continue making accurate predictions and recommend optimized decisions. Secondly, in addition to promotions, all the pricing can be managed within the application, including functionality for discount elasticity. If we were to discount certain products in certain geographies, or permanently change their price like price elasticity curves, how will that affect demand volumes, etc?
Promotional planning and pricing is also part of a larger, let's call it problem or operation within most organizations because pricing and promotion feeds into the demand forecast. So again, we can use a multi algorithmic approach for predicting what demand is likely to be by different time horizons, by different suppliers or products or distribution centers. And based upon this forward predicted demand, which is based on promotions, pricing, all external data that we might be feeding into the model, we can then optimize our replenishment, our stock in warehouses, what we're producing. The application then considers the problem from end to end.
And we could type in, for example, that we want our working capital to be no more than 100 million and our customer service level, which is fill rate or DIFOT, no less than 94 percent. And we pick a scenario which might be a geography, a retailer, a distribution center, a product category, even an individual product. And then we can be presented with a series of trade offs that shows us working capital and how it relates to fill rate and where we can be on these KPIs if we are willing to have a little bit more working capital or if we were willing to sacrifice some of the fill rate.
So promotional planning and pricing is also part of a bigger operation, part of a bigger problem. And we'll discuss the concepts of global optimization later in the Rise of Artificial Intelligence. Because when we optimize a component of a business or of an operation, we can generally get a better result if we consider more components together.