Automated Spatiotemporal Semantic Understanding of Buildings in Construction and Use Phases

Automated Spatiotemporal Semantic Understanding of Buildings in Construction and Use Phases

Project Team

Martin Fischer, Silvio Savarese, Iro Armeni


Understanding a building’s state in space and through time is essential for many processes in the Architecture, Engineering, Construction and Facilities Management (AEC-FM) domain. Currently laser scans provide the most accurate information about space and when collected repeatedly they capture its transformation over time as well. However, point clouds lack semantic information about the depicted area; there is no knowledge about the type of building elements present (e.g. walls, columns, etc.), their location or the in-between relationships. As a result, in the context of building structures, spatial and temporal changes need to be inferred by humans, with significant time allocation. We propose to develop a method based on Computer Vision and Machine Learning that automates the creation of spatial meaning through time with the generation of a spatiotemporal 3D semantic model of the facility. This can have immediate applications during its construction, operation and use/ re-use phases.

Project Background

Research Motivation

Designers, project managers, superintendents, suppliers, owners and operators are all affected by the availability of accurate and up-to-date documentation of a building’s state in space and through time; what essentially constitutes a spatiotemporal 3D semantic model. Such a model includes information about a facility’s components and their in-between relationships over lifecycle phases. For example, during construction this information enables quality control, progress determination, incorporation of design changes, checking if a prefabricated system fits on site and accurate supply scheduling. During delivery to client, detailed as-built documentation can be handed over. During operation and maintenance, it can support activities especially when major maintenance needs to happen fast (e.g. plant turnarounds). During refurbishment, it can drive decision making for design purposes and reconfiguration of spaces. Hence spatiotemporal semantic information can impact the majority of a facility’s lifecycle phases and involved professionals.

According to the US National Building Information Model Standard Project Committee “Building Information Modeling (BIM) is a digital representation of physical and functional characteristics of a facility. A BIM is a shared knowledge resource of information about a facility forming a reliable basis for decisions during its life-cycle” [4]. Currently, the vast majority of BIMs reflect the design status and ignore construction and operation phases. Although some physical characteristics primarily related to the as-built status are obtained manually, BIMs cannot form a reliable basis for decision making throughout a facility’s lifecycle. One of the reasons of incomplete documentation is the time and cost of maintaining current and accurate information across lifecycle phases. A recent report from the National Institute of Standards and Technology (NIST) [6] suggests that “an inordinate amount of time is spent locating and verifying specific facility and project information from previous activities”. The estimated cost of inadequate interoperability for a building’s lifecycle is around $15.8 billion, most of which is spent in the operation ($9 billion) and construction phases ($4 billion) [5].

Monitoring a building’s state has been facilitated the past years by off-the-shelf technologies that accurately capture the current status of a building (e.g. laser scanners). State-of-the-art practice involves iterative laser scanning and manual post-processing of the acquired data. The benefits of automating the surveying process with the use of scanners in terms of both cost and duration have been demonstrated the past few years in numerous cases in the industry. Such examples report not only acquiring an increased amount of additional information to traditional approaches, but also a cost reduction around 75% and a project duration reduction of approximately 60% [1,2].

Despite knowing the potential of laser scanning, it is still not widely employed. For example, construction progress continues to be recorded manually on tablets or hardcopies, which indicates that the output of laser scanners is not immediately useful to the AEC-FM industry since it requires substantial post-processing to acquire information that is easier to record with pen-and-paper, even if the former would be more accurate and richer in content. The reason for this is that 3D point clouds contain no high level information about the building elements present (type, location, quantity, shape, etc.) and are temporally fragmented (scanned data are uncorrelated over time). Since there is no automation in extracting this information, it is identified manually, which is a time-consuming and error-prone task [3], especially in large-scale buildings. For example, it took 200 hours to model the stainless pipes in a 14,000 sq. ft. chemical plant’s point cloud [7] and 180 hours to model steel components on a ⅓ mile-long pipe rack’s point cloud [8]. Although the modeling process can be accelerated with the use of tools such as Imaginit’s ‘Scan-to-BIM’ and EdgeWise’s ClearEdge 3D, there is still a significant amount of manual work involved. 

It is therefore evident that there is a gap between point cloud representations and the type of information required by workflows, which in combination with the extensive manual inference required, it impedes their direct use as an input. Hence, there is a need for automatic ways that will allow the generation of data of such structure and representational power that could be used as an input in a variety of AEC-FM related processes and proven to be substantially beneficial in order to become standard practice.

Research Objectives

We propose to develop a method that builds a seamless bridge through time between the captured virtual environment and the real world via a single framework, by understanding the spatial state of a facility during its lifecycle, as depicted in 3D point cloud data. In a higher level, we want to be able to automatically and quantitatively answer questions regarding the building elements (which, how many, where, what size, etc.) at any time, and specifically from the point of construction that the main structural frame is erected onwards. 

The proposed research aims to structurize collected raw data and provide to the virtual world a close to human-level understanding of the visible real world. The main goal is to use this information directly or after post-processing in order to positively influence current workflows.

Expected Results: Findings, Contributions and Impact on Practice

We anticipate that an automated structuration of such data will (a) allow the direct input of raw data in workflows without the need for human intervention, (b) provide a more comprehensive understanding of the depicted environment and (c) spread the use of depth sensors throughout the lifecycle of a building since their applicability will be immediate and broad. Expected benefits to a few selected workflows are:

Construction progress monitoring: Reduce the time and cost of monitoring activities; facilitate more frequent monitoring if needed, thus allowing managers to catch potential scheduling issues, improve day-to-day operations and report more current information to project stakeholders.

Construction inspection: Automatically identify deviations between as-designed and as-built elements.

Facility refurbishment/renovation: Provide a more complete understanding of existing conditions, including hidden elements, thus improving productivity and safety.

The main contribution of this work is to further close the gap between the real and virtual world and facilitate a better understanding of our buildings throughout their lifecycle. By measuring its impact on a number of existing processes we will demonstrate the direct benefits to builders, operators, designers and clients. In addition, this research will serve as the point of departure for more in depth studies of built environments, such as those related to space optimization and decision making. Apart from the civil engineering domain, other communities will benefit as well, such as those of computer vision, robotics and augmented reality.

Sumary of Performed Tasks

During the past few months the following tasks were performed:

  • Data Collection: We collected data in two buildings under construction and we are continuing to acquire additional data. The data collection involves the later stages of construction, with changes happening on the interior layout. Both new constructions and renovation projects are included.

  • Data Pre-processing: We pre-processed the data to registered point clouds of high density and appropriate format for the requirements of our research.

  • Data Annotation: We annotated great part of the already captured data, for different object categories that include structural, MEP and other building elements.

  • Prior work review: We performed thorough review of the existing literature landscape, and identified the gaps and limitations of existing methods in terms of spatiotemporal semantic understanding of 3D data in construction sites and indoor spaces, both in the domain of Automation in Construction and Computer Vision.

  • First experiments: We have started exploring methods for learning and representation. To better define our approach we will be incorporating any newly collected and annotated data to acquire a good understanding of strengths and limitations.

  • Additional components: During the course of this project, due to the labor-intensive and time-consuming of the iterative laser scanning process, we started exploring an autonomous scanning system to create a fully automatic pipeline from scan to semantic understanding. To better comprehend limitations of current practices in a quantitative way in terms of time spent, scanning a 32,000 sq feet facility under construction can take from 6 to 10 hours* depending on how far along is the construction and the type of sensor used. Modeling of the scanned output can take from 3 days up to 4 weeks, depending on the level of scene complexity and amount of clutter. With this being an iterative process, one would have to allocate this amount of time** and associated cost several times in the construction timeline. To reduce spent resources, we are exploring automating these tasks.

The developed project website that will reflect future updates can be found here.

* These numbers reflect the use of a medium quality sensing system; a higher quality one could reach up to 60 hours.

** Pre-processing and data registration time is not included in these numbers, since even with an automatic system there might be some overhead related to this task.


[1] Laser Scanning vs. Conventional Surveying during Post-Construction, [Accessed on 4/28/2016]

[2] Reality Computing for Construction at McCarthy, [Accessed on 4/28/2016]

[3] Brilakis, I., Lourakis, M., Sacks, R., Savarese, S., Christodoulou, S., Teizer J. & Makhmalbaf, A., 2010, ‘Toward automated generation of parametric BIMs based on hybrid video and laser scanning data’, Advanced Engineering Informatics, 24 (4), pp. 456-465

[4] WHAT IS A BIM?, [Accessed on 2/28/2016]

[5] East, W.E. Construction Operations Building Information Exchange (COBie), 2013, [Accessed on 4/28/2016]

[6] NIST Cosy analysis of inadequate interoperability in the U.S capital and facilities industry, 2004, [Accessed on 2/27/2016]

[7] Advanced Extraction Algorithms Reduce False Pipe Identification in Laser Scan Point Cloud, [Accessed on 4/30/2016]

[8] Truescan3D Revisits Ohio Pipe Rack, Reduces Structural Modeling Time by 60%, [Accessed on 4/30/2016]

Last modified Fri, 23 Jun, 2017 at 12:33