## Project Team

## Research Overview

### Observed Problem:

### Primary Research Objective:

### Potential Value to CIFE Members and Practice:

- This methodology enables designers and engineers to support robust design and operation of natural ventilation systems in buildings.
- The capability to implement accurate model-predictive control systems will facilitate the implementation of performance-based contracting (PBC) for these buildings.
- Research will enable the more widespread use of natural ventilation in buildings and also support routine use of computational fluid dynamics (CFD) for the design of sustainable and resilient buildings and cities.

### Research provides relevant insights for:

Design and Operations

### Research and Theoretical Contributions:

Previous research has contributed to a working model for buoyancy driven ventilation. The current research will focus on wind-driven ventilation, where the flow rates are characterized by the pressure coefficients on the openings. Since the flow around the building has a highly turbulent character, the calculation of the pressure coefficients requires computationally expensive large-eddy simulations (LES). The objective of this research is to provide a methodology to develop machine learning models that can correct errors in less expensive Reynolds-averaged Navier-Stokes (RANS) predictions, such that an accuracy similar to large-eddy simulations (LES) can be achieved at a dramatically lower computational cost.

### Industry and Acadmic Partners

We are interested in collaboration with CIFE members to connect our modeling frameworks to software strategies that combine computational building representations with a range of prediction tools. Building owners and design-build teams interested in high-fidelity predictions of building performance can help the research team connect the multi-fidelity modeling framework to current practice to speed up its widespread adoption. Given our focus on existing test cases for which we have data available, the proposed research does not depend on data provided by CIFE members, but additional datasets of naturally ventilated buildings would be helpful to validate our findings in different configurations.

## Research Updates & Progress Reports

### Progress Report - July 22, 2020

Initial work has focused on the prediction of the root-mean-square (rms) pressure coefficient Cp' on a high-rise building and has been submitted for publication. A multi-fidelity machine learning approach has been proposed to combine a large number of computationally efficient RANS with a smaller number of LES The model has been trained to relate the Cp' obtained from LES to 5 non-dimensional and Galilean invariant features calculated from RANS.

The full data set consists of RANS and LES simulations at a 10o wind direction resolution. The data for the 10^{o} , 30^{o}, 50^{o}, 70^{o} and 90^{o} wind directions are used to train the model, while the LES for the 0^{o}, 20^{o}, 40^{o}, 60^{o} and 80^{o} wind directions are only used to evaluate the model performance. Based on model search and hyperparameters tuning, performed by employing a left-out simulation at 45^{o}, a 5-layer neural network with 10 hidden units per layer and ReLU activation function is found to achieve the lowest root mean square error (RMSE) on the test set. A bootstrap technique, that samples with replacement the training data, is used to produce an ensemble of 1000 models and support reporting a mean and confidence interval for the model predictions. Subsequently, the model is re-trained to predict each of the 5 wind directions in the test set.

When training a universal model on all 5 wind directions in the training set and using it to predict all 5 wind directions in the test set, the RMSE is on average 2 times smaller than the RMSE of a standard empirical model. When training a targeted model on a select subset of 2 wind directions in the training set, to predict a specific wind direction in the test set, this RMSE is further reduced by 20%. In this case, the 2 wind directions used for training are selected by considering the similarity between the joint distributions of the first two principal components of the features, for the wind directions in the training and test sets; these first two principal components explain 99.7% of the variance in the dataset, and the similarity between their distributions can be quantified using the Kullback-Leibler divergence. Comparison to models trained on various combinations of wind directions in the training set, indicates that this strategy for selecting the training data results in optimal performance for 3 of the 5 test wind directions. For the remaining wind directions, the best model still relies on training data from wind directions with a low Kullback-Leibler divergence, but it uses data from either only 1 or 3 wind directions. Figure 1 visualizes the performance of the machine learning model by comparing the predictions to the LES data directly, and to the results obtained with standard empirical model for the 40^{o} wind direction.

Figure 1: Contours of rms pressure coefficient at 40◦wind direction

The test case that experiences the worst agreement is the 0^{o} wind direction. This wind direction has the highest average Kullback-Leibler divergence, and it is shown that the model breaks down in regions where it is extrapolating in the space defined by the first and two principal components. In these regions, the lack of data also implies that the bootstrap method cannot provide accurate information on the uncertainty in the model. At the remaining wind directions, the 95% confidence interval predicted by the bootstrap procedure encompasses between 70% and 84% of the data, and the maximum absolute error remains limited to 0.08.

In summary, the proposed multi-fidelity framework has the potential to significantly reduce the number of LES simulations needed for design, while retaining a significantly higher accuracy than standard empirical models. The findings from this study also have broader relevance to machine learning for turbulence modeling. First, the use of principal component analysis to select training data with optimal similarity to test data is shown to improve model performance; it could also be used to identify a need for additional training data, or to identify regions where the model should not be trusted. Second, the use of a bootstrap procedure to create an ensemble of machine learning models provides useful confidence intervals, as long as the model is not extrapolating beyond the training data in the space of the first two principal components. Future work will focus on further customization of the selection of optimal training data, on applying the procedure to the mean and peak pressure coefficients as the quantities of interest, and on comparing the performance of this method with polynomial-chaos based multi-fidelity simulation methods.