Skip to content Skip to navigation

Any time Data Thinking Prototype for Construction Project Managers

Project Team

Martin Fischer, Ram Rajagopal, Parisa Nikkhoo


 Big data and machine learning (ML) techniques have provided novel insights in various fields. The generation of such insights requires integrating datasets, data exploration, and the development of an explanatory model to generate actionable insights and an ongoing cycle of learning.

We propose to collect data that are available over the course of projects, such as schedule updates (pre-construction activity types, durations, crews), monthly status reports (fee analysis, project information), and 3D models (material quantities) to explore whether there are patterns that correlate with project success or failure based on schedule and fee performance.

First, we will integrate these datasets and will develop a large data-frame. Cleansing the data, and identifying existing patterns, we will examine significant relationships among schedule updates, BIM development, staffing, procurement and project permitting processes, and change orders that lead to short or long schedules and fee gain or erosion, using ML methods and tools such as VISDOM. 

Project Background

Research Motivation

Project and construction managers have little more than their experience to decide whether their project is on the road to success or not. Advancement of technology and availability of historical data motivated us to explore the past construction projects and provide a tool that would assist the project team in their decision-making process.

Industry Example

It would be interesting for instance, to know whether the following items correlate with successful or unsuccessful projects:  
- A particular juxtaposition of design and schedule changes for a particular staffing level and a particular type of contract. 
- The number of milestones and their locations on the project timeline 
- The number of schedule updates based on the project type and duration
- The pattern of critical path and the number of critical activities
- Change orders and RFIs and their impact on activity delays

Research Objectives

The ultimate goal is to develop a tool to assist every GC in employing lessons learned from the past projects. We propose to collect schedule updates (pre-construction activity types, durations, crews), monthly status reports (fee analysis, project information), and 3D models (material quantities) to explore whether there are patterns that correlate with project success or failure based on schedule and fee performance.
The outcome of this study will benefit the project key stakeholders. On one hand, the owner would not encounter surprises or sudden changes in the cost of work or the project duration due to poor decisions. And on the other hand, the GC will have ensured schedule of work and profit.    

Research Update

We started the study by visualizing milestone and critical activity patterns of the data we received from our collaborator (DPR construction). We developed two algorithms to map milestones and critical path of construction project schedules. The results from the first study, suggest that there is a correlation between the successfulness of the projects and the number of milestones considered in the schedule of these projects. Further analysis could calculate the exact number of milestones per month as including at least one milestone every month is highly recommended by project managers of the successful projects in this study. Also, future studies could focus on the reasons behind the observed inconsistencies in the milestones.

The results from the second study revealed that the scheduling tool, P6, eliminates the critical activity tag when the activity is marked as completed. Therefore, it is highly recommended that researchers and data scientists interested in working with scheduling data exported from P6 to tailor their analysis around this limit.

In the next step, we focused on visualizing schedule activities' finish dates of 10 projects using a method that we developed through ‘spaghetti diagrams’. The patterns we observed encouraged us to investigate the causes of schedule variations by (1) qualitatively and (2) quantitatively exploring the correlation between change orders and the activities’ finish dates. I will include some examples of the spaghetti diagrams we developed; the patterns suggest that correlation exist between the date on which the change orders were submitted, and the date on which the activities’ finish date got changed. In an ongoing effort, we are developing a method to quantitatively explore this correlation.

Scheduling Data Analysis: Milestones and Critical Path

The goal of this report is to share the results of two studies; in the first study, through mapping the finish milestones in the scheduling data of 13 successful  and 9 unsuccessful projects, we found that the successful projects had a more consistent spread of finish milestones in their schedules than unsuccessful projects. In the second study, through mapping the changes to the activities on the critical path, we noticed that the number of critical activities changed dramatically from one schedule update to another. Further investigating this, we discovered that the P6 software, which was used to develop these schedules, removed the critical tag from the activities that were completed in each schedule update.

In the first part of this report, we will present the methodology we used for mapping the millstones along with the results. In the second part, we will present the methodology we used for developing these graphs along with the results to emphasize the importance of identifying the limitations of the scheduling software programs from which the data is being exported.

Study 1: Milestones
Data collection

To start, we selected the scheduling data for 22 large construction projects (i.e. projects with a contract amount of $50M and more), from different core markets. The reason for selecting large projects was mainly because these projects often have full-time schedulers who are responsible for maintaining the scheduling data quality and updating them on a regular basis. Based on whether these projects were completed on-time and on-budget, we labeled 13 projects as successful and 9 as unsuccessful. Table 1 shows the list of these projects (for confidentiality reasons, projects are renamed as alphabetical letters).

Data Analysis

We conducted exploratory data analysis (EDA) on the collected scheduling data. Start and finish milestones in the schedules are activities with zero durations which are check points in the schedules to control the project. The developed algorithm in this study, took the finish milestone dates of each schedule update of each project and mapped them against the total duration of that project. To integrate all the projects in one graph and compare them, we normalized the durations of the projects.
As it can be seen from graphs 1 and 2, almost all the successful projects represent a consistent spread of finish milestones throughout the project.  In unsuccessful projects however, many inconsistencies were found among the finish milestones meaning for large portions of the project no finish milestones were considered in the schedules. This can be seen especially in D, K, N, and Q projects. Next, we explored the project types and found that all data center projects (i.e. projects I, L, P, and S) had finish milestones included in their schedules on a regular basis, whereas in hospital projects (i.e. projects A, N, and U), no finish milestones were considered in large time-frames of the project.



Study 2: Critical Activities and P6 Software Limits

Data collection

The data for this study is the same data from the Milestones study.

Data Analysis

The developed algorithm in this study, takes the latest schedule update for each project, filters activities with a critical tag, and maps their start, and finish dates. Using this algorithm, we mapped the critical path for the first schedule update of successful project W (figure 3). As it can be observed from the graph, the activities span consistently throughout the project duration having no overlaps or gaps.

In the next phase, we ran the developed algorithm on the last schedule updates for unsuccessful projects, N and D, to observe the final pattern of the critical path of those projects and be able to compare them to the pattern shown in figure 3. As shown in figures 4 and 5, the critical activities for unsuccessful projects did not span over the entire duration of the projects and for some parts of the projects the activities were missing. Whereas we expected the observed inconsistency for unsuccessful projects, we were expecting a complete opposite pattern for successful projects. To test this, we ran the algorithm on two successful projects, and surprisingly, the patterns were similar to that of unsuccessful projects.

Talking to the scheduling teams of these projects, we discovered that the scheduling software (P6), removes activities from the critical path as they are marked as completed. This was the reason why the number of activities shown on the critical path was reduced dramatically in the last schedule updates.

To ensure that the critical activities are missed because P6 eliminates the critical tag, we ran the algorithm on the mid-schedule updates of two selected projects. Figure 6 shows the graphs for both the last schedule update, and the mid schedule update of the successful project M, and Figure 7 shows both graphs for the unsuccessful project D. The graphs for project M confirm that P6 removes the critical tag from the activities that are completed. The interesting observation was that in project D, even half-way through the project, the critical path was not well-defined; this might correlate with the unsuccessfulness of the project.



(Ongoing) Spaghetti Diagrams

Data Collection

The data used in this study were the same projects in the milestone/critical activities study. The idea is to eventually use change order data, however, due to data quality, RFI data were used to start the analysis. 

Data Analysis

The algorithm that we developed for automating the production of these spaghetti diagrams, combines schedule activities finish dates throughout schedule updates based on the activity’s unique IDs and creates a vector of dates for each activity. Through a mapping process, we considered the X-axis and Y-axis  as the planned finish dates of the activities and the dates in which the schedule was updated respectively. Figure 1 represents a simple version of the spaghetti diagrams with three activities. As shown in figure 1.1, the algorithm first represents planned finish dates with dots, then draws a line connecting the dots for each activity (figure 1.2), and then removes the dots (figure 1.3). The reason for keeping the lines is that through lines, the changes into finish dates of the activities is easier to track. To further facilitate this, as shown in figure 1.4, we added smoothness to the lines in the plotting section of the algorithm. In these figures, the activity in green, was supposed to be finished on 15th of February and was finished as planned. The activity in purple was also planned to be completed on the same date but was delayed and completed two weeks later on the 28th of February. The blue activity’s finish date which was the same as the other two activities was pulled and the activity was completed earlier than planned. In these diagrams, the straighter the line, the more predictable the activity was and the more curved the line, the more the activity’s projected finish date was fluctuated.

As depicted in figure 2, we added change order/RFI data to spaghetti diagrams as a vertical line based on their submission date. 

The following figures show the spaghetti diagrams for three large projects (>$50M). We color coded the RFIs based on their discipline. The horizontal line at 100% shows the planned completion date of the project. As depicted in figure below, Architectural RFIs are followed by lots of structural RFIs, which coincides with changes in slope of the lines. These RFIs seem to have correlation with the project delay.

Figure below shows that the critical activities had almost zero slope until architectural RFIs were introduced; these RFIs seem to have a correlation with some of the activities' slope changes.

Finally, figure below shows that one specific activity’s finish date was pushed by 200 days and then pulled back in later schedule updates. This pattern might have a correlation with the civil and/or structural RFIs.  Also, some activities were added to the project at the 12% of the project duration and were planned to finish outside the initial project completion timeframe. These activities were also delayed as a piping RFI was submitted. 
In an ongoing effort, we aim at determining the quantitative correlation between the change order/RFI data with the project activities' finish dates. The ultimate goal is to develop a method that can provide insights to the project managers to minimize the delay caused by change orders/RFIs. 


1- Sanvi, V., et al. (1992). "Critical Success Factors for Construction Projects." Construction
Engineering and Management
118(1): 94-111. 
2- Diekmann, J. E. and M. J. Girard (1995). "Are Contract Disputes Predictable?" Construction
Engineering and Management
121(4): 355-363.
3- Chan, A. P. C., et al. (2004). "Factors Affecting the Success of a Construction Project." Journal of
Construction Engineering and Management
: 130(1): 153-155.
4- Fischer, M. et. (2006). “Performance Dashboard for Innovative and Industrialized Construction.”
CIFE Project, Stanford University

Original Research Proposal

Funding Year: 
Stakeholder Categories: