Using arcpy for geoprocessing

This tutorial will show you how to use arcpy to do basic geoprocessing tasks.

I have a feature class of private properties with attributes such as property’s name, address, latitude, longitude and completion year, i.e  the year it was completed development.

Unfortunately in some of the records, the completion year value is missing. Now, I will show you how to estimate the missing data by using geoprocessing with arcpy.

Following illustration shows the process flow I used to achieve this task.

estimate_workflow

Extract Records with missing completion year

First I need to extract those records with missing year from the feature class. The records are stored in “C:\\workspace\default.gdb” as “PRIVATEHOUSES” feature class.


import arcpy
arcpy.env.workspace ='C:\\workspace\\default.gdb'
input_feature = 'PRIVATEHOUSES'

# First make a layer from feature class. Because we are going to use
# arcpy.SelectLayerByAttribute_management tool and it doesn’t allow
# feature classas input

arcpy.MakeFeatureLayer_management(input_feature, 'input_lyr')

# This will select the records from input_lyr with
# Comletion_Date value is Null

mising_records = arcpy.SelectLayerByAttribute_management('input_lyr',
                                                          'NEW_SELECTION',
                                                          'Completion_Date IS Null')

This code snapshot shows you how to use SelectLayerByAttribute_management tool to select records based on attribute value of feature class or layer. After running this code, I have all the records with missing year value in “missing_records” variable.

Search for other records within specified range

Next, I need to loop through all missing_records and get estimation for each of them.

# Create a search cursor to loop through records
rows = arcpy.SearchCursor(missing_records)
for row in rows:
    #  Estimation code goes here
    #  …
    #  …

arcpy.SearchCursor() creates a read-only access cursor to records of a feature class or table. For each record, I try to find the surrounding data points within specified range __ in this case, I use 100 meters.

# Get object id from the search cursor
objectid = row.getValue("OBJECTID")
query = "OBJECTID=%d" % objectid

# Selecting the feature with the current OBJECTID
temp_selected = arcpy.SelectLayerByAttribute_management('ph_lyr', "NEW_SELECTION", query )

# Selecting surrounding records within 100 meters
selected_within_100m = arcpy.SelectLayerByLocation_management('ph_lyr',
                                                              'WITHIN_A_DISTANCE',
                                                               temp_selected,
                                                               100,
                                                               "NEW_SELECTION")

I extract the object id of the current record by using row.getValue() method. After that, the query is prepare for selecting current feature. Then I use arcpy.SelectLayerByAttribute_management(…) to select the current feature and assign it into variable temp_selected. This variable is passed to arcpy.SelectLayerByLocation(…) function to get the records in specified range.

Get a median completion year

Next step is to get estimation of completion year from the retrieved records within range. For estimation I just calculate and return median value of the other records in range.

# Make an estimation of year of completion based on surrounding records
def getEstimateFromSurroundings(surrounding_rows):
    surrounding_years =[]
    # Ignore records with null values
    for row in surrounding_rows:
        val = row.getValue("Completion_Date")
        if val != None:
            surrounding_years.append(val)

    # if there is no values in the list, return None
    if len(surrounding_years) == 0:
        return None

    # I use numpy library to calculate median
    median_year = numpy.median(surrounding_years)
    return median_year

I write a function called getEstimateFromSurroundings() function to do the estimation work. In the function, first, I check and remove the records with Null value. The rest are put into a list and median value is extracted. I use numpy to calculate median value from the list, you can implement your own function to get it but I am too lazy do it myself 🙂

After I got the estimation, I update the current record completion year with the estimated value. Since I don’t want to modify the input feature layer, I make a copy of the input feature layer for output using CopyFeatures_management and update the estimated value to it.

# Copy the input feture
arcpy.CopyFeatures_management(input_feature, output_feature)

Update the missing completion year with estimated one

Next I implement a function to update estimated value to current record.

def updateCompletionYear(objectid, year):
    update_cursor = arcpy.UpdateCursor(output_feature, "OBJECTID=%d" % objectid)
    if(update_cursor):
        update_row = update_cursor.next()
        update_row.Completion_Date = year
        update_cursor.updateRow(update_row)

        # Release pointer to update record
        del update_row

    # Release pointer to update cursor
    del update_cursor

The function above do the updating of record to output feature layer for the given OBJECTID and year value. I use arcpy.UpdateCursor(…) method to update the Completion_Year data field.

Putting all together

# To estimate completion year from surrounding units

import arcpy
import numpy
from datetime import datetime

arcpy.env.workspace = "C:\\workspace\\default.gdb"
arcpy.env.overwriteOutput = True

input_feature = "PRIVATEHOUSES"
output_feature = "OUTPUT_PRIVATEHOUSES"

# Make an estimation of year of completion based on surrounding houese
def getEstimateFromSurroundings(surrounding_rows):
	surrounding_years =[]
	for row in surrounding_rows:
		val = row.getValue("Completion_Date")
		if val != None:
			surrounding_years.append(val)

	if len(surrounding_years) == 0:
		return None

	median_year = numpy.median(surrounding_years)
	#print surrounding_years
	#print "Estimated age: ", median_year
	return median_year

def updateCompletionYear(objectid, year):
	update_cursor = arcpy.UpdateCursor(output_feature, "OBJECTID=%d" % objectid)
	if(update_cursor):
		update_row = update_cursor.next()
		update_row.Completion_Date = year
		update_cursor.updateRow(update_row)
		del update_row
	del update_cursor

def main():
	# Timing
	starttime = datetime.now()

	# First make a layer from feature class
	# Because select layer by attribute doesn't work with feature class
	arcpy.MakeFeatureLayer_management(input_feature, 'ph_lyr')
	arcpy.CopyFeatures_management(input_feature, output_feature)
	uncomplete_records = arcpy.SelectLayerByAttribute_management('ph_lyr', "NEW_SELECTION", "Completion_Date IS Null")
	rows = arcpy.SearchCursor(uncomplete_records)
	for row in rows:
		objectid = row.getValue("OBJECTID")
		query = "OBJECTID=%d" % objectid
		temp_selected = arcpy.SelectLayerByAttribute_management('ph_lyr', "NEW_SELECTION", query )
		selected_within_40m = arcpy.SelectLayerByLocation_management('ph_lyr', 'WITHIN_A_DISTANCE', temp_selected, 1200, "NEW_SELECTION")
		surrounding_rows = arcpy.SearchCursor(selected_within_40m)
		estimated_year = getEstimateFromSurroundings(surrounding_rows)

		updateCompletionYear(objectid, estimated_year)

		if estimated_year == None:
			estimated_year = -1
		print "Objectid %d, year %d" % (objectid, estimated_year)
	print (datetime.now() - starttime)

if __name__ == '__main__':
	main()

This process takes place for all the records in the for loop. After finished, all the records in my output feature layer has completion year.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s