Visual Analysis using Synthetic Datasets


Visual testing is a nondestructive testing method where an object is inspected for flaws. This inspection is often done using recorded images and/or video recordings, ranging from microscopic to drone imagery (Exhibit 1A). In one nondestructive testing engagement, SFL Scientific utilized 6,000 drone-recorded images to build a machine learning (ML) algorithm that automatically detects and labels eight different types (Exhibit 2) of concrete damage in utility plants. The aim of this ML algorithm was to automatically and accurately identify the location and type of damage found in the drone imagery (e.g., Exhibit 1).

Exhibit 1. (A) Drone imagery of a concrete structure with damage (B) overlaid with the ML-generated mask of damaged area (C) overlaid with a hand-drawn mask of the damaged area.

Category Sample Size
Abrasion 25
Corrosion 775
Crack 300
Efflorescence 1750
Grease Stain 1300
Honeycomb 150
Pattern Cracking 1200
Spall 500

Exhibit 2. Sample sizes for the eight concrete damage categories within the dataset. Note that these are multi-labels, so different damage types may appear on the same image.



Each raw image sample (e.g., Exhibit 1A) provided to SFL Scientific was accompanied by a hand-annotated mask (e.g., Exhibit 1C) where damage location was indicated and labeled. The hand-annotated masks were used as ground truth to train the ML models.

Two ML models with distinct pipelines were explored. The first model was based on ShapeMask(Exhibit 3). Specifically, model training was performed on 95% of the data and consisted of creating shape priors and image segmentation; Shape priors were first obtained by clustering the mask labels in the training set (95% of the dataset) and then used to guide image segmentation into the distinct concrete damage categories. The model was then validated using the remaining 5% of the dataset reserved for evaluation. Three different dimensions of shape priors, 32×32, 48×48, and 64×64, were explored and compared for performance.

Exhibit 3. ShapeMask ML Pipeline

The second model was a multilabel classifier built using Xception, which is a deep convolutional neural network with depth-wise separable convolutions. 95% of the raw image data was provided as input, and their corresponding labels were provided as output, and a single model was trained to tune the relationship between the two. The remaining 5% of the dataset was used for model evaluation.



The ShapeMask model produced distinctly different patterns of shape priors for each concrete damage class (Exhibit 4) that appear visually consistent with patterns seen in the images. However, the validation results demonstrated that there were varying levels of performance for different concrete damage classes (Exhibit 5). The ShapeMask model was evaluated on both its accuracy in labeling the damage (Accuracy and Shape Recall) and Shape IOU which is the amount of spatial overlap of the mask defined during segmentation. The ShapeMask model performed well for the corrosion, crack, efflorescence, and grease stain classes, but did poorly at locating and identifying the remaining damage classes. This result is consistent with low data size classes having poor performance, with the exception of the pattern cracking class. Comparison of performance across mask sizes showed that classification accuracy and recall was relatively better when using the 48×48 mask size, but that Shape IOU was relatively better when using the 64×64 mask size.

Exhibit 4. Selected feature-based shape priors for eight of the nine concrete damage types (not shown: Tendon cap)

Exhibit 5. Results for the different mask sizes using the ShapeMask model. The green shading indicates the best performance between the three mask sizes for that class and the performance metric.

The Xception model was evaluated on its ability to correctly label the type(s) of damage present in each image. Performance was good (ranging from 80-95.5% accuracy) for five of the eight classes. Notably, the Spall class, which the ShapeMask model could not classify correctly, had very good performance when using the Xception model. However, similar to the ShapeMask model, performance was poor for the Abrasion, Honeycomb, and Pattern Cracking classes.

Comparison between the two models show varied performance depending on the damage class. In addition, the ShapeMask model has the advantage of producing masked segments that help pinpoint the approximate location of damage within the image. Thus, depending on the purpose, it can be beneficial to compare predictions from both models during nondestructive testing.

Class Accuracy
Abrasion 0.0
Corrosion 84.5
Crack 80.2
Efflorescence 95.5
Grease Stain 88.0
Honeycomb 0.0
Pattern Cracking 0.0
Spall 88.9

Exhibit 6. Results for the Xception model



Visual testing often requires laborious and time-consuming manual effort to locate and identify potential damage or flaws. This process can be automated using ML. As shown in this use case, a labeled dataset from previous/historical visual testing efforts can be very useful in helping build an automated ML model for use in future visual testing. While some damage types are detected more accurately by the model, performance can be potentially improved using various strategies, including increasing and/or balancing dataset sizes for the different classes of damage/flaw, or using different feature-based shape prior mask sizes. These can be explored during an ML engagement.


Work with Us

Start a Project

We'd like to help creating your next solution, whether modernizing legacy platforms or developing new AI solutions. Technology moves fast, let's build sustainable solutions.
Get Started