Investigating use of aggregated data: does it compromise information gleaning?
, 2014, Vol. 66, No. 6, pp. 60-65
Arku, D.; Ganguli, R.
The increased use of sensors to facilitate automation has led to an explosion of data. As data sets grow in size, their use becomes more complex. Therefore, there is an incentive for reducing the size of the datasets, as long as this does not compromise information gleaning. With this in mind, this project explored basic aggregation as a viable alternative to amassing large amounts of data. The goal of the project was to determine if one would arrive at different conclusions had they analyzed aggregated data instead of raw data. In a mine, operational data is typically used for tracking process statistics, correlation between factors, autocorrelation and multivariate models (for a factor of choice). Therefore, this project compared results from such analyses when it was conducted with aggregated data (AD) with the results of analyses conducted with raw data (RD). The data consisted of one-minute-apart readings from a semi-autogenous grinding (SAG) mill on the horsepower, feed rate (tons per hour), rotations per minute (rpm), noise, recycle (tons per hour), density (% solids) and bearing pressure (psig). This data was aggregated by averaging five and 10 continuous (one-minute) observations. Changes in statistical measures between the two approaches were typically within 5% of each other. Differences in correlation coefficients between RD and AD(s) were also within this range. Autocorrelations present within RD were preserved in AD, as long as the time lags were scaled for the amount of aggregation. Regression and neural network models were also developed for power consumption. The prediction performance of developed models was similar whether one used raw data or aggregated data. Overall, it appeared that aggregating the data based on 5- or 10-minute time spans would not have resulted in conclusions (from simple or complex analyses) being different from using raw data. In other words, whether one had used raw data or aggregated data, the conclusions would have been the same, suggesting that time domain aggregation, a common industry practice, is a good alternative to storing high-density data in this case.