Buy Now

Methods to Remove Noisy Data in Surfer 8

Introduction

Recently, a customer wrote for help filtering a data set. He took measurements of a sample of corroded steel and gridded the data using the Inverse Distance to a Power method. He saw that there were points in the data set with anomalous Z values and wondered what would be the best way to remove the anomalies.

Filter Data

Originally, we tried a data filter with a constant Z value using the Grid | Data | Filter Data command. This method worked well for data that fit the constant Z criteria. Unfortunately, not all of the anomalous values were conveniently located above a threshold Z value, and we sought other methods.
 

Specifying a data exclusion filter

1. In the Grid Data dialog box, click on the Filter Data button
to display the Filter dialog.
2. Enter the constant data filter in the
Data Exclusion Filter edit box.

Grid Smoothing

Grid smoothing works on a previously created GRD file by using nearby grid nodes to recalculate the Z values.

The Grid | Spline Smooth menu commands recalculates the grid when a different number of grid rows and columns is specified.
 

Spline smoothing

The Spline Smooth method can help eliminate the effect of spiky grid nodes
by recalculating the grid when you specify the number of rows and
columns different than the original grid file.

The Grid | Filter | Linear Convolution Filters | Low Pass Filters (either User Defined or Predefined Filters) are another method than can be used. These filters calculate a weighted average of the values from nearby grid nodes to achieve the smoothing.
 

Digital Filtering

The Low-pass Filters in the Digital Filtering dialog box use values from nearby grid nodes to
smooth the grid file.

Both smoothing methods can be valid ways to eliminate spiky data, but they also smooth the grid in areas that do not have anomalous data points. We decided to investigate another method that is more localized to the spiky data.

Cross Validation

Cross validation calculates the Z value at a data point by using only Z values from the surrounding data points, omitting the Z value at the data point in question. This method is normally used to assess the quality of a particular gridding method. In addition, points that are poorly estimated by surrounding data may be indicative of anomalous Z values.

The Cross Validation option is accessed by clicking the Cross Validate button in the Grid Data dialog box.
 

Cross Validate

Click the Cross Validate button in the Grid Data dialog box.

In the Cross Validation dialog, specify all the data points to validate, save the results to a DAT file, and click OK. In the Grid Data dialog box, click Cancel to bypass the gridding process.
 

Cross Validation

Specify all the data points to validate,
and save the results to a file.

Open the cross validation data file in the worksheet.
 

Cross validation data

Format of the cross validation data file.

Calculate a new column H as the absolute value of the Residuals by clicking on the Residuals column, choosing the Data | Transform menu command, and specifying the equation H = fabs(F), where, H is the destination column, F is the original column and fabs() is the absolute value function. The values for the First row and Last row are taken from the selected column automatically.
 

Absolute value of the residual

Calculate the absolute value of the cross validation residuals
with the worksheet Data | Transform command.

Next, sort the data on column H. Select the entire worksheet by clicking the button to the left of column A and above row 1. Choose the Data | Sort menu command, change the Sort First By column to column H, verify that the Labels in first row check box is checked, and click OK. Save the changes with the File | Save menu.

The customer created a graph of the absolute value of the residual vs. the row number to help analyze the data. Here's how the graph looks in Grapher.
 

Line graph of sorted residuals

Graph of the absolute value of the cross validation residual vs. the row number.

The graph shows a sharp change of slope at 0.5, which indicates that most of the residual values are below 0.5. A histogram of the data shows this information in a bar format.
 

Histogram of residuals

Histogram of the absolute value of the residuals.

Based on these graphs, the customer decided to eliminate the data with residuals greater than 0.5. In the Surfer worksheet, select the rows with the absolute value of the cross validation residual greater than 0.5, delete them, and save to a new file name.
 

Delete points with large residuals

Select and delete the residual values greater than 0.5 in the worksheet.


 
Classed post map of anomalies

A classed post map of the data set displays the anomalous residual values in red, and other
data points are shown in gray. A contour map of the kept data underlies the points.

Conclusion

Surfer provides many helpful tools for analyzing and displaying information about data sets. Data filtering and grid smoothing are useful in many cases. Cross validation is a tool normally used for choosing the best gridding algorithm for a data set, but it may also be used to search a data set for spiky or anomalous data points.

 

Back to Newsletter Index

Home * Products * Gallery * Support * About Us * Register * Order * Demos * Sitemap