Dr. John MacGregor: The Power of Analyzing Big Data

As a chemical engineer and statistician who has been working on the analysis of big data in the process industries for over 50 years, I thought that writing this blog might provide some insight and perspective to those of you who are just now facing this task. From my early career in the late 1960s analyzing pilot plant and analytical data at Monsanto, through 36 years of research on data analytic technologies at McMaster University and then 16 years as president/CEO/chairman of ProSensus Inc., I have seen an exponential increase in big data analytics.

Until the mid-1990s, things progressed very slowly due to the lack of good hardware and software to collect and extract the large volumes of data. Before then, data historians did not allow engineers to easily extract data for analysis – they only collected, displayed and the stored the data in compressed form (which, in most cases, effectively rendered the data useless).

Today, there is fabulous hardware and software that makes all these tasks much easier. The data may be bigger, but so is the computing power. There is also a great selection of data analytic tools, ranging from machine learning (ML) methods such as deep learning neural networks, statistical regression and decision trees (e.g., random forests) to multivariate statistical methods (PCA/PLS).

However, one must pay attention to the objectives of the data analysis when selecting any of these tools. In general, the former methods (neural networks and robust regression) are very well-suited to building models for passive applications such as event detection, process monitoring and inferentials. But if one is interested in building models for active use such as process optimization, then the latter multivariate approaches are much preferred.

Since process data usually involves hundreds of highly correlated input/regressor variables (X), ML and regression-based methods that focus on modeling only the output variables (Y) provide no uniqueness. Therefore, 100 people using the same data set will get 100 different models, all of them providing essentially equally good predictions of Y.

At ProSensus, we focused on the multivariate statistical methods (PCA and PLS), since by simultaneously modeling both the regressor (X) and output (Y) spaces, they provide unique models that are causal and interpretable, and can be used to adjust or optimize the process. Furthermore, the results are displayed graphically and are easily used and interpreted by process engineers — all key objectives of the ProMV software we developed (now Aspen ProMV™). Here’s an overview of some of the key features:

Aspen ProMV is a tool for the analysis of big data from both batch and continuous processes. It allows for easy detection and diagnosis of abnormal process operation, for the optimization of operating conditions.
Aspen ProMV’s online capabilities provide real-time monitoring of continuous and batch systems and for the deployment of inferentials or soft sensors for both batch and continuous processes. For batch processes, the inferred final batch quality and endpoint are updated at every time point during the batch.
The Batch APC capability is the equivalent of multivariate process control for batch processes. It is a supervisory control technology that sits on top of the batch automation system and provides adjustments to the batch operating conditions at a few selected times during the batch in order to optimize the final product quality and minimize batch time.

Fortunately, it is very easy for process engineers to get started using this technology. ProSensus partners with AspenTech customers to implement the Aspen ProMV technologies, providing consulting services, in-depth training and our popular “accelerated modeling sessions” that combine training and analysis on the client’s own data.

To learn more about how you can leverage these powerful, but easy-to-use data analytic tools to make major improvements to your processes, please join us for our upcoming live webinar on 6 February.