Address cleaning

Definition

Address cleaning is a technical process used to normalize and standardize address data in order to reduce errors and improve accuracy. It uses data quality practices such as validation, parsing, standardization, geocoding, and suppression to improve the accuracy, completeness, and validity of an address database while reducing the amount of time spent managing the data.

Validation is often the first step. This step performs checks to make sure the data being entered follows the correct formatting rules and contains specific components. Once this stage is completed, parsing occurs. This process breaks down the address into its components such as street address, city, state, and zip code for the purpose of standardization.

During the standardization step, the address is normalized according to formatting rules and conversions to ensure consistency.Geocoding uses coordinates from street, postal codes, and cities to locate addresses and coordinates of physical locations. During this step, incorrect information is flagged and addressed. The final step in the address cleaning process is suppression. This removes any duplicate or unwanted records to maintain data quality and accuracy.

For purposes of forward geocoding, the best way to ensure a good API response is to have address in the format: <house number>, <street>, <neighbourhood>, <city>, <state>, <country>, <postcode>