The Guided Wave (GW) nondestructive testing is used to inspect the engineering structures and to detect wall thinning and discontinuities in piping in electrochemical, oil, gas, and other industries. The ultrasonic guided wave has the advantage of traveling a long distance with little loss in energy. It has the ability to inspect hidden and inaccessible regions of structures as well as structures underwater, coatings, insulation, and concrete. The signal frequencies can be selected based on the material and the thickness of the object to enhance the performance in defect resolution and scan coverage. It is easy to install, which helps reduce the cost.
In this project, an acoustic wave is introduced into one end of the elongated structure and measured by a sensor on the other end. Any flaws tend to manifest as amplitude attenuations in the measured waveform. In one nondestructive testing engagement, SFL Scientific utilized 300 data samples obtained from experiments on steel objects with known surface flaws. The labeled waveforms were used to build a machine learning model to automatically detect fault presence in piping structures.
Modeling with Simulation Data
Because of the relatively small dataset obtained from guided wave experiments, the initial idea is to train an ML model with simulation while evaluating it with experimental data. The simulation data may provide the following advantages:
- It is much easier to generate a large dataset with simulation than with experiments.
- It facilitates the process of creating different flaw shapes and flaw depth.
- It reduces cost.
However, several issues were observed during the modeling process which resulted in this approach being determined to be infeasible.
- A long list of features generated with the simulation data are tested in the EDA, while none of them are representative of experimental data.
- The random noise simulated in the nominal data is not sufficient to represent the experimental data with no flaw. The experimental data have more significant variations, which may be due to many factors like the imperfection of the plate surface, the uncertainty associated with the plate length, the uncertainty related to the sensor, etc.
- A multiple classification model is built with simulation data. It reached a macro-average F1 score of about 85% in the model testing with simulation data, which is only about 10% when testing it with experimental data.
The effort of making use of the simulation data is suspended due to the fact that it needs study to fix the simulation vs experimental data discrepancy issue.
Modeling with Experimental Data
The machine learning pipeline consisted of feature engineering, splitting the data into training and testing sets, model training, and model validation (Exhibit 1).
Exhibit 1. ML Pipeline: splitting the dataset into training and testing; building an ML model with training data; Model evaluation with testing data. This procedure is repeated multiple times to estimate the performance metric values and uncertainties with bootstrapping.
120 out of 300 total data samples have varying amounts of flaws, while the rest have no flaws. Based on prior knowledge that flaw presence manifests the amplitude attenuations in the data, features were engineered by measuring amplitude across separate bins in the raw time series. To avoid the dependence of the absolute value of the waveform amplitudes, the lowest frequency signal is taken as reference, while ratios are taken of the signals at higher frequencies to the reference. A comparison of no flaw data to the with flaw data at a frequency of 200kHz is shown in Exhibit 2.
Exhibit 2. Binned features in time series with signal frequency at 200kHz comparing the schematic waveforms when a flaw is absent (Top) and present (Bottom). A significant amplitude attenuation can be seen in the waveform where a flaw is present.
Modeling and Bootstrapping
The engineered features from the training dataset were used to train a model. This model was then evaluated using the testing dataset. An example of a model testing confusion matrix is found in Exhibit 3.
Exhibit 3: Schematic confusion matrix of results from one iteration of model evaluation, where the X-axis shows the number of occurrences for data predicted as a flaw (1) or no flaw (0), while the Y-axis presents the corresponding values for flaws in the ground truth. The color scale indicates the number of samples.
The following performance metrics are calculated to evaluate the validity of the model:
- Accuracy: What percent of the detected results is correct? (e.g. (34+19) / 60=88.3% in Exhibit 3)
- Precision: What percent of the detected flaws are real? (e.g. 19 / (19+3) = 86.4% in Exhibit 3)
- Recall: What percent of the real flaws are detected? (e.g. 19 / (19+4) = 82.6% in Exhibit 3)
- F1: Harmonic mean of precision and recall (i.e., the inverse of the average of 1/precision and 1/recall. E.g. 2/(1/88.3 + 1/86.4) = 87.3% in Exhibit 3)
To avoid unstable performance evaluation as a result of randomly splitting the small sample size, bootstrapping via resampling with replacement is used to estimate the degree of uncertainty related to the values of the performance metrics.
Specifically, the random train-test-split, model training, and model evaluation steps were repeated 500 times. The mean and standard deviations of each performance metric were calculated across the 500 iterations, summarizing the distribution of results for each metric.
Despite the limited amount of data in this project, average precision and recall of over 85% are achieved as shown in Exhibit 4. The results are impressive, which reflects the fact that the engineered features can separate the signal with and without flaw almost decisively. This is supported by the physics indicating the amplitude attenuation of a guided wave due to the presence of any wall loss.
|Performance Metric||Estimated mean with standard deviation|
|Accuracy||90.2 ± 9.9 %|
|Precision||89.6 ± 11.4%|
|Recall||87.1 ± 10.6%|
|F1||88.3 ± 7.7%|
Exhibit 4: Estimated mean and standard deviations of the performance metrics using bootstrapping technique.
This model’s performance is obtained with measurements on artificially-generated flaws in lab-maintained plates. It must be verified when applied to real-world applications. The guided wave measurements on real flaws in real heat exchangers may differ from those from the plates from labs. However, the deviation of the with flaw data in the features would show up in a similar fashion in the real-world data. Besides, the ratio calculation in the feature engineering minimizes the dependences of the actual amplitude of the signal. The features are expected to be valid to a certain extent even when the experimental setup (e.g. material of the object, filter used with the signal receiver sensor, etc) is different. Overall, it raises the confidence in utilizing the guided wave data for flaw detection.
Machine learning offers a quick and automatic alternative to replace current laborious and expensive manual inspections. This can potentially be implemented as a continuous and automated monitoring solution.