|
Methods to Remove Noisy Data in
Surfer 8
Introduction
Recently, a customer wrote for help filtering a
data set. He took measurements of a sample of corroded steel and gridded the
data using the Inverse Distance to a Power method. He saw that there were points
in the data set with anomalous Z values and wondered what would be the best way
to remove the anomalies.
Filter Data
Originally, we tried a data filter with a
constant Z value using the Grid | Data | Filter Data command. This method
worked well for data that fit the constant Z criteria. Unfortunately, not all of
the anomalous values were conveniently located above a threshold Z value, and we
sought other methods.
1. In the Grid Data
dialog box, click on the Filter Data button
to display the Filter dialog.
2. Enter the constant data filter in the Data Exclusion Filter edit box.
Grid Smoothing
Grid smoothing works on a previously created GRD
file by using nearby grid nodes to recalculate the Z values.
The Grid | Spline Smooth menu commands
recalculates the grid when a different number of grid rows and columns is
specified.
The Spline Smooth method
can help eliminate the effect of spiky grid nodes
by recalculating the grid when you specify the number of rows and
columns different than the original grid file.
The Grid | Filter | Linear Convolution Filters
| Low Pass Filters (either User Defined or Predefined Filters) are another
method than can be used. These filters calculate a weighted average of the
values from nearby grid nodes to achieve the smoothing.
The Low-pass Filters in the Digital
Filtering dialog box use values from nearby grid nodes to
smooth the grid file.
Both smoothing methods can be valid ways to
eliminate spiky data, but they also smooth the grid in areas that do not have
anomalous data points. We decided to investigate another method that is more
localized to the spiky data.
Cross Validation
Cross validation calculates the Z value at a data
point by using only Z values from the surrounding data points, omitting the Z
value at the data point in question. This method is normally used to assess the
quality of a particular gridding method. In addition, points that are poorly
estimated by surrounding data may be indicative of anomalous Z values.
The Cross Validation option is accessed by
clicking the Cross Validate button in the Grid Data dialog box.
Click the Cross Validate
button in the Grid Data dialog box.
In the Cross Validation dialog, specify
all the data points to validate, save the results to a DAT file, and click OK.
In the Grid Data dialog box, click Cancel to bypass the gridding
process.
Specify all the data points to
validate,
and save the results to a file.
Open the cross validation data file in the
worksheet.
Format of the cross validation
data file.
Calculate a new column H as the absolute value of
the Residuals by clicking on the Residuals column, choosing the Data |
Transform menu command, and specifying the equation H = fabs(F),
where, H is the destination column, F is the original column and fabs() is the
absolute value function. The values for the First row and Last row
are taken from the selected column automatically.
Calculate the absolute value of
the cross validation residuals
with the worksheet Data | Transform command.
Next, sort the data on column H. Select the
entire worksheet by clicking the button to the left of column A and above row 1.
Choose the Data | Sort menu command, change the Sort First By
column to column H, verify that the Labels in first row check box is
checked, and click OK. Save the changes with the File | Save menu.
The customer created a graph of the absolute
value of the residual vs. the row number to help analyze the data. Here's how
the graph looks in Grapher.
Graph of the absolute value of
the cross validation residual vs. the row number.
The graph shows a sharp change of slope at 0.5,
which indicates that most of the residual values are below 0.5. A histogram of
the data shows this information in a bar format.
Histogram of the absolute value
of the residuals.
Based on these graphs, the customer decided to
eliminate the data with residuals greater than 0.5. In the Surfer worksheet,
select the rows with the absolute value of the cross validation residual greater
than 0.5, delete them, and save to a new file name.
Select and delete the residual
values greater than 0.5 in the worksheet.
A classed post map of the data
set displays the anomalous residual values in red, and other
data points are shown in gray. A contour map of the kept data underlies the
points.
Conclusion
Surfer provides many helpful tools for analyzing
and displaying information about data sets. Data filtering and grid smoothing
are useful in many cases. Cross validation is a tool normally used for choosing
the best gridding algorithm for a data set, but it may also be used to search a
data set for spiky or anomalous data points.
Back to
Newsletter Index
|