Mining the data mountain

Nov, 2011

Every day, companies and governments collect enormous amounts of data. Many of them just watch these ‘data mountains' pile up.

But for companies as diverse as Google and Goldman Sachs, the ability to exploit this data provides an enviable competitive edge. And in the public sector, the empirical insights gleaned from these data mountains enable more informed public policy development and better infrastructure provision.

Melbourne Business School Professor of Econometrics Michael Smith says organisations that are passive about data management are in all likelihood passing up valuable opportunities.

"You can't get away from the fact that organisations are collecting more and more data, and their effective use is getting increasingly important," he says.

"When it comes to real-life problems, relationships between variables are typically quite complex. Many existing analysis techniques don't do justice to this complexity, and organisations often run into difficulties when they try to identify and measure relationships between key variables."

A new tool, the copula model (after the Latin for ‘link" or "tie"), overcomes this problem, allowing identification and measurement of complex relationships in large databases.

"Copulas are used widely in finance to identify how investment returns are inter-related" Professor Smith says. "Consider the returns from investing in Rio and BHP. In a rising market, these often rise together, but at different rates because investors focus on differences between each company and their activities. But in a falling market, both stocks tend to fall together at the same rate, because people simply panice and bail out of investments in Australian mining stocks altogether. Copulas can capture these differences in ways previous models could not."

Professor Smith has been at the forefront of developing powerful copula modelling techniques, and applying them to solve a variety of problems.

His recent research has been published in peak international academic journals in both statistics and marketing, including the Journal of the American Statistical Association.

 "It hasn't taken long for copula models to become very popular," Professor Smith says. "They're changing the way analysts deal with information-rich data in fields as diverse as marketing, finance and transport studies".

He says copula modelling has taken off because, from a statistical point of view, it breaks a difficult problem down into two much simpler ones.

"Copula models allow you to focus on modelling variables in which you are interested individually. Then, in a second, separate step, you use a copula to capture the inter-relationships between them.

 "This is much simpler than previous methods, which modelled variables and their inter-relationships all together in one go."

Professor Smith's work with Monash University Marketing Professor Peter Danaher establishes  how modelling multiple variables together using copulas provides both improved accuracy in forecasting customer behaviour, and also useful insights in the development of advertising strategies.

In a beleaguered retail environment, careful use of copula models offers great potential, he says.

 "Using these methods on large databases of customer online retail activity can lead to improved design of websites. This is something that will prove crucial for retailers as they move increasingly into the online space."

More:

Peter J. Danaher and Michael S. Smith. 2011. "Modeling Multivariate Distributions using Copulas: Applications in Marketing". Marketing Science, 30(4): 4-21.

Michael S. Smith and Mohamad A. Khaled. 2011. "Estimation of Copula Models with Discrete Margins via Bayesian Data Augmentation" Forthcoming in Journal of the American Statistical Association (Theory and Methods)