Curso - Aseguramiento de Cualidad de la informacion geoespacial -
Workshop on Assuring Data Quality

presented by Nicholas CHRISMAN

INEGI 23-27 September 2019, Aguascalientes

Online address: http://www.nchrisman.fastmail.fm/Present/INEGI_DQ_2019.html


Objectivo :

Al finalizar el curso, los participantes conocerán los conceptos, fundamentos y mejores prácticas sobre la calidad de la información geoespacial, para poder instrumentarla en su trabajo.

Temario

  1. Normas internacionales de calidad geoespacial: indicadores y metricas
  2. Precision posicional
  3. Precision de clasificacion, exhaustividad
  4. Consistencia logica (entre niveles)
  5. Aspectos estadisticos y metodos de meustra
The rest of this resource will be in English.

This workshop will consider all aspects of data quality, particularly the means to test and evaluate each aspect of quality.
From the 40 year (more or less) experience of your guide.
Each section (day) will include concepts, fundamantals and practical implementation.

Ensuring spatial data quality

  • What can be checked?
  • Have you checked it?
  • Have you fixed it? (and why not…)?

  • Program for the week - Rough outline

  • Day 1: Monday 23 September.
  • Introductions :
  • Who is this bearded professor?
  • Attendees from INEGI (around the room) ...
  • Back to the origins - 1982 Working Group on Cartographic Data Quality, National Committee for Digital Cartographic Data Standards (youngest member, chosen as chairperson, the others understood how committees work).
  • "Data Quality will occupy more space than the coordinates" (1983, Ottawa); my colleagues laughed.
  • Key concepts (take care with translations):
  • FITNESS FOR USE ( from a glossary .nl; legal glossary : adapted for the user needs (?)
  • Internal versus External data quality (francophone) - internal oriented toward production (one-time); external oriented toward user requirements (multiple- for each user).
  • Accuracy / precision / resolution (English); exactitud / precisión / resolución (espanol)
  • Terrain nominale (French) : translated (very horribly) as 'abstract universe' !
  • Critical analysis: who does what?
  • Division of labor (specialization); division of knowledge (Not everyone knows the same things).
  • Centralist model : One map for everyone (no alternatives), producer decides. Based on specialized knowledge, skills, equipment.
  • Revolution « fitness for use » - switch to user perspective. Do the users have all the tools?
  • Will the barrier disappear? Consumers become producers « pro-sumer »
  • Practical exercise : Starting point for metadata resources
  • Find a portal / platform (A few starting points):
  • Geoplatform (USA); TIGER-example; Endangered Species (ISO)
  • Utah Data (Utah, USA); Boundaries>Zip Codes (FGDC Metadata)
  • CONABIO - Mexico Portal de geoinformacion sobre biodiversidad - SNIB (9753 temas) - formato: FGDC
  • Mapas INEGI biblioteca - example: Carta topográfica. F13D19d (metadata?).
  • Metadatos INEGI Clearinghouse... 455383 entidades...
  • Geoportail (France) IGN...
  • Geocatalogue (France) IGN...
  • Geoportail cadastral (Tunisia) metadata?
  • Let's examine the contents... Is this information useful?
    Why are the standards so disfunctional?
  • Five elements of data quality (and some extras)
  • Lineage
  • Practical exercise 2: Lineage at the level of features (objects)
  • OpenStreetMap (https://www.openstreetmap.org/)
  • for example - one object: Biblioteca Emilio Alanís Patiño; next to INEGI main building; edited 7 months ago by Carlos Faz (50 edits, all around Aguascalientes).
  • INEGI: Cartografia participativa (Is this operational?)
  • Crowd Lens (shows people contributing to OSM)
  • Logical Consistency (=Conceptual Consistency - ISO) and Completeness
  • Accuracies (of position, attribute (classification) and time) – plus resolutions
  • Yes, three ways to assess accuracy, but only two are measurable. Time may be hard to separate from the others; the object may have moved - is the error in time or position? The object may have changed, is the error in classification or in time?

    Day 2 (Tuesday) - Positional Accuracy

  • Positional Accuracy Handbook, 1999, Minnesota Land Management Information Center. pdf.
  • National Standard for Spatial Data Accuracy 1998.
  • Do not confuse (or conflate) : resolution and accuracy!
  • Major issue:

  • Sampling: obtain unbiased estimate of average conditions; must apply to the same kind of object (well-defined points)
  • Exhaustive inventory: Estimate accuracy by feature class (not all 'well-defined')
  • Field Work : How to apply the concept of « terrain nominale »
  • Field trip (at least conceptually) (Testing various sources: Google, Bing, OSM, INEGI, and others)
  • Consider how the specifications for a street (caretera) will change what point to test. Center-line? Edge of pavement?

  • Day 3 (Wednesday) - Attribute accuracy (Classification)

    This aspect is best developed in the remote sensing field.
    Classification accuracy is assessed by a square matrix comparing the product (rows) to 'ground truth' (columns).
    Correctly classified pixels are in the diagonal.
    Percent by row is 'producer's accuracy'. By column, 'user's accuracy'.

    An example

    For objects (not exhaustive rasters), there is an additional row and column - not found/nonexistent. Nothing can be in the diagonal cell for the misclassification matrix (clearly).

    IGN France did an exhaustive sample in a limited area of the maps tested. ALL objects were assessed for their classification on the map and in the field (according to terrain nominale).

    For example, road classes: highway, street, dirt road, footpath, NONEXISTENT in a square matrix.
    It showed the error rate where some highways were coded as 'street', but very few as 'dirt road'.

    Day 3 continued - Completeness (Exhaustivity)

    Completeness is measured by the 'extra' row and column. Did all objects appear in the product?

    The cartographic scale of a product dictates some issues of completeness. Certain classes of features are not expected (no dirt roads at 1:100,000?).

    A direct test of completeness requires an external database of all features that should be on the map. In some applications, there is such a list (eg. parcels in a tax register, inventory of roads for maintenance), but the identifiers need to match or some geocoding is required.


    Day 4 (Thursday) - Logical Consistency

    Not a test against external sources, but using internal structure of the data to test itself.

    Simple level: legal codes

    Are all feature codes in the specification? (Remove illegal codes).

    Next level: consistency with other elements

  • Are ‘mountain’ categories found at low elevation? (High mountain vegetation code at 5m elevation? Very unlikely…)
  • Navigational buoys on land? Stop lights in the water?
  • Is the river coded as outside the floodplain?
  • Geometric test: topological structure

    The most powerful test of logical consistency involves the geometry of a map using the topological data structure. This test was first implemented by the US Census Bureau prior to the 1970 Census. Back then the geometric databases were crude, and incomplete. Adopting a topological data structure permitted them to test for completeness and geometric integrity.

  • Do all lines intersect only at established nodes? (overshoots/undershoots)
  • Do all polygons close properly?
  • (Are any lines missing?)
  • Are all lines labelled correctly for polygons to left and right?
  • Some additional resources on topological structure: lecture materials on logical consistency and more on topological representation

    A bit of history of the DIME and TIGER developments at US Census Bureau:

  • James Corbett and the Euler network
  • Don Cooke and Bill Maxfield publish paper about DIME: Census Use Study, New Haven
  • DIME: hand coded from maps, key punched (Geographic Data Technology)
  • ARITHMICON software (Corbett and Marvin White)
  • TIGER – use USGS topographic maps (1:100,000), cooperation Census and mapping agency
  • Conflation: adding attributes from one source to better geometry from another
  • Relevance of TIGER (and DIME) to INEGI - using maps for a statistical census agency.


    Day 5: Friday - Conclusion, summary, further explorations.

    Gaps in the standards: still elements to evolve...

    Entering metadata is very slow and incomplete unless it is directly tied to the data entry procedures.


    Some documents


    Version of 21 September 2019