This tutorial will show you how to use arcpy to do basic geoprocessing tasks.
I have a feature class of private properties with attributes such as property’s name, address, latitude, longitude and completion year, i.e the year it was completed development.
Unfortunately in some of the records, the completion year value is missing. Now, I will show you how to estimate the missing data by using geoprocessing with arcpy.
Following illustration shows the process flow I used to achieve this task.

Extract Records with missing completion year
First I need to extract those records with missing year from the feature class. The records are stored in “C:\\workspace\default.gdb” as “PRIVATEHOUSES” feature class.
import arcpy
arcpy.env.workspace ='C:\\workspace\\default.gdb'
input_feature = 'PRIVATEHOUSES'
# First make a layer from feature class. Because we are going to use
# arcpy.SelectLayerByAttribute_management tool and it doesn’t allow
# feature classas input
arcpy.MakeFeatureLayer_management(input_feature, 'input_lyr')
# This will select the records from input_lyr with
# Comletion_Date value is Null
mising_records = arcpy.SelectLayerByAttribute_management('input_lyr',
'NEW_SELECTION',
'Completion_Date IS Null')
This code snapshot shows you how to use SelectLayerByAttribute_management tool to select records based on attribute value of feature class or layer. After running this code, I have all the records with missing year value in “missing_records” variable.
Search for other records within specified range
Next, I need to loop through all missing_records and get estimation for each of them.
# Create a search cursor to loop through records
rows = arcpy.SearchCursor(missing_records)
for row in rows:
# Estimation code goes here
# …
# …
arcpy.SearchCursor() creates a read-only access cursor to records of a feature class or table. For each record, I try to find the surrounding data points within specified range __ in this case, I use 100 meters.
# Get object id from the search cursor
objectid = row.getValue("OBJECTID")
query = "OBJECTID=%d" % objectid
# Selecting the feature with the current OBJECTID
temp_selected = arcpy.SelectLayerByAttribute_management('ph_lyr', "NEW_SELECTION", query )
# Selecting surrounding records within 100 meters
selected_within_100m = arcpy.SelectLayerByLocation_management('ph_lyr',
'WITHIN_A_DISTANCE',
temp_selected,
100,
"NEW_SELECTION")
I extract the object id of the current record by using row.getValue() method. After that, the query is prepare for selecting current feature. Then I use arcpy.SelectLayerByAttribute_management(…) to select the current feature and assign it into variable temp_selected. This variable is passed to arcpy.SelectLayerByLocation(…) function to get the records in specified range.
Get a median completion year
Next step is to get estimation of completion year from the retrieved records within range. For estimation I just calculate and return median value of the other records in range.
# Make an estimation of year of completion based on surrounding records
def getEstimateFromSurroundings(surrounding_rows):
surrounding_years =[]
# Ignore records with null values
for row in surrounding_rows:
val = row.getValue("Completion_Date")
if val != None:
surrounding_years.append(val)
# if there is no values in the list, return None
if len(surrounding_years) == 0:
return None
# I use numpy library to calculate median
median_year = numpy.median(surrounding_years)
return median_year
I write a function called getEstimateFromSurroundings() function to do the estimation work. In the function, first, I check and remove the records with Null value. The rest are put into a list and median value is extracted. I use numpy to calculate median value from the list, you can implement your own function to get it but I am too lazy do it myself 🙂
After I got the estimation, I update the current record completion year with the estimated value. Since I don’t want to modify the input feature layer, I make a copy of the input feature layer for output using CopyFeatures_management and update the estimated value to it.
# Copy the input feture
arcpy.CopyFeatures_management(input_feature, output_feature)
Update the missing completion year with estimated one
Next I implement a function to update estimated value to current record.
def updateCompletionYear(objectid, year):
update_cursor = arcpy.UpdateCursor(output_feature, "OBJECTID=%d" % objectid)
if(update_cursor):
update_row = update_cursor.next()
update_row.Completion_Date = year
update_cursor.updateRow(update_row)
# Release pointer to update record
del update_row
# Release pointer to update cursor
del update_cursor
The function above do the updating of record to output feature layer for the given OBJECTID and year value. I use arcpy.UpdateCursor(…) method to update the Completion_Year data field.
Putting all together
# To estimate completion year from surrounding units
import arcpy
import numpy
from datetime import datetime
arcpy.env.workspace = "C:\\workspace\\default.gdb"
arcpy.env.overwriteOutput = True
input_feature = "PRIVATEHOUSES"
output_feature = "OUTPUT_PRIVATEHOUSES"
# Make an estimation of year of completion based on surrounding houese
def getEstimateFromSurroundings(surrounding_rows):
surrounding_years =[]
for row in surrounding_rows:
val = row.getValue("Completion_Date")
if val != None:
surrounding_years.append(val)
if len(surrounding_years) == 0:
return None
median_year = numpy.median(surrounding_years)
#print surrounding_years
#print "Estimated age: ", median_year
return median_year
def updateCompletionYear(objectid, year):
update_cursor = arcpy.UpdateCursor(output_feature, "OBJECTID=%d" % objectid)
if(update_cursor):
update_row = update_cursor.next()
update_row.Completion_Date = year
update_cursor.updateRow(update_row)
del update_row
del update_cursor
def main():
# Timing
starttime = datetime.now()
# First make a layer from feature class
# Because select layer by attribute doesn't work with feature class
arcpy.MakeFeatureLayer_management(input_feature, 'ph_lyr')
arcpy.CopyFeatures_management(input_feature, output_feature)
uncomplete_records = arcpy.SelectLayerByAttribute_management('ph_lyr', "NEW_SELECTION", "Completion_Date IS Null")
rows = arcpy.SearchCursor(uncomplete_records)
for row in rows:
objectid = row.getValue("OBJECTID")
query = "OBJECTID=%d" % objectid
temp_selected = arcpy.SelectLayerByAttribute_management('ph_lyr', "NEW_SELECTION", query )
selected_within_40m = arcpy.SelectLayerByLocation_management('ph_lyr', 'WITHIN_A_DISTANCE', temp_selected, 1200, "NEW_SELECTION")
surrounding_rows = arcpy.SearchCursor(selected_within_40m)
estimated_year = getEstimateFromSurroundings(surrounding_rows)
updateCompletionYear(objectid, estimated_year)
if estimated_year == None:
estimated_year = -1
print "Objectid %d, year %d" % (objectid, estimated_year)
print (datetime.now() - starttime)
if __name__ == '__main__':
main()
This process takes place for all the records in the for loop. After finished, all the records in my output feature layer has completion year.
1.289670
103.850070