Curso - Aseguramiento de Cualidad de la informacion geoespacial -
Workshop on Assuring Data Quality
INEGI 23-27 September 2019, Aguascalientes
Online address: http://www.nchrisman.fastmail.fm/Present/INEGI_DQ_2019.html
Objectivo :
Al finalizar el curso, los participantes conocerán los conceptos, fundamentos y mejores prácticas sobre la calidad de la información geoespacial, para poder instrumentarla en su trabajo.
Temario
- Normas internacionales de calidad geoespacial: indicadores y metricas
- Precision posicional
- Precision de clasificacion, exhaustividad
- Consistencia logica (entre niveles)
- Aspectos estadisticos y metodos de meustra
The rest of this resource will be in English.
This workshop will consider all aspects of data quality, particularly the means to test and evaluate each aspect of quality.
From the 40 year (more or less) experience of your guide.
Each section (day) will include concepts, fundamantals and practical implementation.
Ensuring spatial data quality
Program for the week - Rough outline
Day 1: Monday 23 September.
Introductions :
Who is this bearded professor?
Attendees from INEGI (around the room) ...
Back to the origins - 1982 Working Group on Cartographic Data Quality, National Committee for Digital Cartographic Data Standards (youngest member, chosen as chairperson, the others understood how committees work).
"Data Quality will occupy more space than the coordinates" (1983, Ottawa); my colleagues laughed.
Key concepts (take care with translations):
FITNESS FOR USE ( from a glossary .nl;
legal glossary : adapted for the user needs (?)
Internal versus External data quality (francophone) - internal oriented toward production (one-time); external oriented toward user requirements (multiple- for each user).
Accuracy / precision / resolution (English); exactitud / precisión / resolución (espanol)
Terrain nominale (French) : translated (very horribly) as 'abstract universe' !
Critical analysis: who does what?
Division of labor (specialization); division of knowledge (Not everyone knows the same things).
Centralist model : One map for everyone (no alternatives), producer decides. Based on specialized knowledge, skills, equipment.
Revolution « fitness for use » - switch to user perspective. Do the users have all the tools?
Will the barrier disappear? Consumers become producers « pro-sumer »
Practical exercise : Starting point for metadata resources
Find a portal / platform (A few starting points):
Let's examine the contents... Is this information useful?
Why are the standards so disfunctional?
Five elements of data quality (and some extras)
Lineage
Logical Consistency (=Conceptual Consistency - ISO) and Completeness
Accuracies (of position, attribute (classification) and time) – plus resolutions
Yes, three ways to assess accuracy, but only two are measurable. Time may be hard to separate from the others; the object may have moved - is the error in time or position? The object may have changed, is the error in classification or in time?
Day 2 (Tuesday) - Positional Accuracy
Major issue:
Field Work : How to apply the concept of « terrain nominale »
Day 3 (Wednesday) - Attribute accuracy (Classification)
This aspect is best developed in the remote sensing field.
Classification accuracy is assessed by a square matrix comparing the product (rows) to 'ground truth' (columns).
Correctly classified pixels are in the diagonal.
Percent by row is 'producer's accuracy'. By column, 'user's accuracy'.
An example
For objects (not exhaustive rasters), there is an additional row and column - not found/nonexistent. Nothing can be in the diagonal cell for the misclassification matrix (clearly).
IGN France did an exhaustive sample in a limited area of the maps tested. ALL objects were assessed for their classification on the map and in the field (according to terrain nominale).
For example, road classes: highway, street, dirt road, footpath, NONEXISTENT in a square matrix.
It showed the error rate where some highways were coded as 'street', but very few as 'dirt road'.
Day 3 continued - Completeness (Exhaustivity)
Completeness is measured by the 'extra' row and column. Did all objects appear in the product?
The cartographic scale of a product dictates some issues of completeness. Certain classes of features are not expected (no dirt roads at 1:100,000?).
A direct test of completeness requires an external database of all features that should be on the map.
In some applications, there is such a list (eg. parcels in a tax register, inventory of roads for maintenance), but the identifiers need to match
or some geocoding is required.
Day 4 (Thursday) - Logical Consistency
Not a test against external sources, but using internal structure of the data to test itself.
Simple level: legal codes
Are all feature codes in the specification? (Remove illegal codes).
Next level: consistency with other elements
Are ‘mountain’ categories found at low elevation? (High mountain vegetation code at 5m elevation? Very unlikely…)
Navigational buoys on land? Stop lights in the water?
Is the river coded as outside the floodplain?
Geometric test: topological structure
The most powerful test of logical consistency involves the geometry of a map using the topological data structure.
This test was first implemented by the US Census Bureau prior to the 1970 Census. Back then the geometric databases were crude, and incomplete. Adopting a topological data structure permitted them to test for completeness and geometric integrity.
Do all lines intersect only at established nodes? (overshoots/undershoots)
Do all polygons close properly?
(Are any lines missing?)
Are all lines labelled correctly for polygons to left and right?
Some additional resources on topological structure: lecture materials on logical consistency and
more on topological representation
A bit of history of the DIME and TIGER developments at US Census Bureau:
Relevance of TIGER (and DIME) to INEGI - using maps for a statistical census agency.
Day 5: Friday - Conclusion, summary, further explorations.
Gaps in the standards: still elements to evolve...
Entering metadata is very slow and incomplete unless it is directly tied to the data entry procedures.
Some documents
Version of 21 September 2019