Data Mining in Construction

Data mining, the practice of discovering patterns or trends through searching through large volumes of data, is one of the ways in which new technology is set to transform construction. In an industry that generates large amounts of ‘big data’, the potential for data mining to streamline processes and uncover efficiencies in construction is significant. While these methods are becoming familiar when applied to accident data to inform health and safety practices1, they are also increasingly being used to predict cost over runs2 and provide early warnings of structural health3.

Construction data typically comprises a mixture of formats, such as numeric, text, spatial and image files, which may be stored in several systems across a business group or company. Discovering patterns in this unstructured data by hand is a time consuming and complex task. Through utilising machine learning and statistical analysis, data mining makes it feasible for businesses to extract value from these data they already own and manage.

While the application of machine learning might already be transforming some areas of construction, how might it revolutionise the field of construction claims? Predicting cost over-runs is an obvious first step: if you can predict the factors which contribute to over-runs then you can address their main causes, avoid costly litigation and generate more realistic timescales for projects.

What about the most common causes of claims: additional work, delays and reduced productivity? In addition to mitigating some of their worst consequences, data mining has the potential to efficiently determine costs and the true cause of delays and budget over-runs. 

One feature of construction claims that serves to complicate the use of data mining techniques is the unique nature of each claim. Construction spans a wide range of sectors, and there are so many points at which projects may encounter a problem that it is difficult to conceive of a universal technique that would apply across the board. However, there is scope for adapting a general model to suit each situation.

Recent research has set out to summarise the current status of data mining in construction and described how data mining is becoming increasingly employed within the industry4. Their observations on the current challenges of data mining in construction include:

  • Data security – a large proportion of construction data is privately owned and inaccessible for large-scale analysis.
  • Poor data quality in construction industry databases – multiple data collection methods and human error contribute to databases that are challenging to prepare for analysis.
  • Knowledge interpretation – a lack of domain experts and skilled practitioners able to translate the results into actions.
  • Limited case studies – current case studies are few and those that exist are limited geographically and/or in context.

They conclude that, in spite of these challenges, the popularity of data mining techniques in construction are increasing dramatically.

As construction claims experts and leaders in data-driven claims analysis, we at 53Quantum are always looking to the future of data use in construction. We expect to see an increase in data mining being used to support claims analysis, alongside other transformative technologies derived from the ‘big data’ and AI landscape.

  1. Construction Safety Clash Detection: Identifying Safety Incompatibilities among Fundamental Attributes using Data Mining (2017) Automation in Construction, Vol 74,
  2. Predicting construction cost overruns using text mining, numerical data and ensemble classifiers (2014) Automation in Construction, Vol 43,
  3. Data mining algorithms for bridge health monitoring: Kohonen clustering and LSTM prediction approaches (2020) The Journal of Supercomputing, Vol 76,
  4. Data mining in the construction industry: Present status, opportunities, and future trends (2020) Automation in Construction, Vol 119,