Data classification, also known as data classing or selection of intervals, is the process by which a set of interval or ratio data are divided into a small number of classes or categories. Such classification is necessary for the construction of classed choropleth maps in which a range of different colors or shadings is used to depict the set of data classes. The selection of intervals so strongly influences the apparent information content of a map that knowing how to choose appropriate class intervals is a necessary skill for any GIS user.
Number of Classes
While there is some disagreement as to the precise number, there is general agreement that human cognition limits our ability to visually discriminate more than 10 or 11 different colors or tint shadings in a single map. Most cartographers suggest no more than seven classes be used. The actual number of classes chosen depends not only on the color used to symbolize the data (the variation in tints for yellow are far fewer than for blue, for example) but also on various characteristics of the data and the map context, including the skill of the map reader, the distribution of the data, and the precision with which class discrimination is needed.
Data classification begins by organizing the set of data in order by value and possibly by summarizing the data with a distribution graph. Class breaks are then inserted at values along this ordered set by one of many different methods. Evans has outlined a generic classification of class-interval systems that suggests a very large number of possible methods. However, most commercial GIS include a small number of methods within their mapping functionality. The most common systems are as follows:
Divide the range of data values by the number of classes desired to produce a set of Classification, Data———39 class intervals that are equally spread across the data range. For example, if the data have a range of 1 to 99 and five classes are desired, then class breaks could be created at 20, 40, 60 and 80.
Divide the number of data values evenly into the number of classes that have been chosen. Thus, if there are to be five classes, each class will contain 20% of the observations.
Calculate the mean and standard deviation of the data set and then classify each value by the number of standard deviations it is away from the mean. Often, data classed by this method will have five classes (greater than 2, between 1 and 2, between 1 and –1, between –1 and –2, and greater than –2 standard deviations) and will be shaded using two different color ranges (e.g., dark blue, light blue, white, light red, and dark red, respectively).
Natural breaks (Jenk’s method)
Classes are based on natural groupings inherent in the data. Jenk’s method identifies the breaks that minimize the amount of variance within groups of data and maximize variance between them.
Karen K. Kemp