Data Mining

Friday, January 14, 2011

Data Mining

Data mining methodologies have been widely adopted in various business domains, such as database marketing, credit scoring, fraud detection, to name only a few of the areas where data mining has become an indispensable tool for business success. Increasingly data mining methods are also being applied to industrial process optimization and control. While the general approach is similar regardless of application (finding "nuggets" of new information in data), some specific methodologies and techniques for optimizing continuous processes, such as boiler performance in a coal-burning power plant, have proven particularly useful for those applications, and superior to existing traditional analytic approaches such as DOE (design of experiments), CFD (computational fluid dynamics), or statistical modeling. This paper will provide an introduction to data mining, and specifically contrast the methods used in data mining with traditional optimization techniques.

Data mining, "knowledge discovery", or "Machine Learning" methods have many origins, drawing on insights from research on learning as it naturally occurs in humans (cognitive science), advances in computer science and algorithm design on how to best detect automatically patterns in "unstructured" data, engineering and advances in machine learning (e.g., neural networks), to name a few. While traditional statistical methods for analyzing data, based on statistical theories and models, are now widely accepted throughout various industries, data mining methods have only been widely embraced in business for a decade or two. However, their effectiveness for modeling and optimizing and improving "difficult" processes are making these techniques increasingly popular – and even necessary – in many real-world process application.

What is Data Mining

Suppose you wanted to optimize a cyclone furnace (an older-type design for burning coal, still in use in many power plants) for stable high flame temperatures. Stable temperatures are necessary to ensure cleaner combustion, and less build-up of undesirable slag that may interfere with heat transfer. Typically, most power plants are equipped with very effective data gathering and storage technologies, so there are easy ways to extract the data that describe various parameter settings, as well as flame temperatures, on a minute-by-minute interval.

How Effective is Data Mining?

Shown below is an example of a real data mining application to a furnace, with the goal to achieve stable flame temperatures above 2,100F. After going through the steps described above – extracting important parameters, and building and then optimizing data mining models, careful validation experiments were performed to verify that the recommendations from the data mining models indeed improve flame temperatures:

In short, there is little doubt that data mining methods were useful and successful in this application, to stabilize the flame temperatures. By making relatively small adjustments to a subset of specific parameters that operators routinely manipulated and adjusted to control the furnace. Specifically, minor adjustments were made to parameters such as coal flow stoichiometric ratios, and primary/secondary/tertiary airflows, to achieve much more robust operations and consistently higher flame temperatures.

To continue the metaphor of the furnace, that is "talking" to the operators through the data, data mining techniques were successful in "interpreting" this language, and using the information to identify the "sweet spot", where the furnace can provide consistent and robust performance.