Data, data, everywhere: What's an analyst to do?|
Chapter 4: Mapping Crime and Geographic Information Systems
Mapping outside data
Although most data used in crime analysis are generated and used within one department, the need to integrate information from other agencies is becoming more important. Unfortunately, outside data are often in an incompatible format. What do you do when you get a delimited or fixed field ASCII file, for example? Table 4.1, "Census data in ASCII," shows data almost exactly as they are presented on the U.S. Census Bureau Web site, with only slight editing to adjust the spacing. The Federal Information Processing Standard (FIPS) codes attached to all
census geographic areas6 make up the
first two columns. Starting in the left-hand column, the State of Missouri FIPS code
is 29. In the next column, the city of St. Louis FIPS code is 510. Next are the tract numbers, literal tract numbers, tract populations (P00010001), tract median family incomes in 1989 (P107A001), and tract per capita incomes in 1989 (P114A001).
Most GIS software can handle data compiled in a variety of formats, although some variations may generate headaches. MapInfo will open the following formats: dBase, Lotus®, Microsoft Excel®, delimited ASCII, some raster files (.tif, .pcx, and
so forth.), AutoCAD® (.dxf), and others, using either the Open or Import command. ArcView expects tabular data to
be in dBase (.dbf), Info, or delimited text (.txt) format. One solution to the somewhat limited data conversion repertoires
of some GIS programs is to launder files through a more versatile spreadsheet program and then transfer them to the GIS in a more compatible format.
To do this, the user imports the foreign data spreadsheet into Lotus, Microsoft Excel, or Microsoft Access®, and then exports it in a GIS-compatible format. This way, a fixed-field ASCII file can be converted into a delimited ASCII or dBase
format. This is done by parsing the fixed-field file in the conversion program, and then outputting it in a delimited format. Parsing is a process of instructing the program how to read the fixed-field data by identifying the variables in each field and dragging field delimiters to appropriate locations. For example, the analyst instructs the program that the case number is in columns 1-10, the address is in columns 11-30, and so forth. Delimited means that each data field is separated from the next by a character such as a comma or a tab. With delimiters, it does not matter if data values have different widths, as in the sequence 3.5, 14.276. When the program recognizes the
delimiter as a cue, it moves to the next value.
You may receive data that consist of
x-y coordinates, without the points themselves. In such situations, the coordinates are used to generate the points in the GIS, using a Create Points command that allows users to select a preferred symbol and an appropriate projection. (For information about map projections, see chapter 1.) After the points are generated, they can be imported as a new layer on the map. Data generated in the field, perhaps from patrol cars using global positioning system technology, can be treated in the same manner.
Any database with an address or geographic reference included can be mapped, provided the corresponding digital base map is available. For example, you may want to map census tract data. The census data are available, but the map of tracts is not. In this case, the map is readily available to download off the Web, but non-census tract maps must be acquired
and data mining
Police departments generate volumes of information. A single call for service ultimately results in its own pile of paper, and computer files tracking all calls for service grow rapidly. Data warehousing and data mining provide sophisticated ways of storing and accessing information.
A data warehouse is a megadatabase that stores data in a single place instead of storing them in project files or throughout the local government or government agency. Government agencies have been slow to do this because agency politics tend to create an attitude oriented more toward defending departmental turf than toward sharing data. A data warehouse could assist with crime analysis efforts, which often demand data from diverse sources, such as the health, housing, traffic, fire protection, liquor licensing, and planning agencies.
Law enforcement is primarily a local government activity, which often leaves police agencies at the mercy of data managers overseeing city or county information technology functions. Ideally, data warehouses consolidate all jurisdictional databases and permit use of data from any agency according to quality control standards. Data mining, as the label suggests, involves digging nuggets of information out of vast amounts of data with specialized tools. These tools are typically called exploratory data analysis (EDA), which,
in the context of mapping, can become exploratory spatial data analysis (ESDA) tools. An IBM software engineer (Owen, 1998) identified these as the factors that brought data mining to the attention of the business community:
- The value of large databases in
providing new insights is recognized.
- Records can be consolidated with a specific audience or objective in mind.
- Cost reductions are achieved with large-scale database operations.
- Analysis is being demassified (futurist Alvin Toffler's term) meaning that the information revolution permits the creation of specialized custom maps for specific audiences.
Chapter 2 discussed hypothesis testing. That discussion now comes full circle because data warehousing and data mining make hypothesis testing even more practical. Queries can be addressed to large arrays of data, increasing the reliability of responses. However, this is truer for historical questions than for current data.