Lecture 02: Data Structures and Simple Calculations

Basic data structures, simple calculations

Objectives of lecture:
1 Revisit basic distinctions (raster/vector) covered many times before
2 Alternatives for data structures

What Exists?

A data model embodies a theory about the world. Theories suppose a set of entities and relationships, inside a set of assumed constraints (axioms).
Models, like Plato's shadows on the cavern walls, are elusive. Data structures implement models (more and less completely). We must study these representations to infer what was intended.

What is "space"?

Clash of ontologies: absolutists (eg. Aristotle, Newton) as container of objects
relativists (eg. Democritus, Mach, Einstein) as a relationship between objects
Fundamentally irresolvable, yet assumed without proof by each implementation

Notice that Worboys brings in the term "invariance" (p. 98) as the basis for a geometry. Topology was what 19th century mathematics found as the underlying invariances of geometry (not the Phythagorean/ Cartesian analytical geometry) Invariance is what Stevens was trying to add for attributes; for all measurements, not just physical space.

Do objects "have" area, perimeter, etc. or do these properties come from relationships? What rules attach position to objects?

The standard story: Raster/Vector, Field/Object

Vector (object) models assume an empty space to be populated by objects.
Raster models suppose a discrete space (atomist), with slots to be filled in.

Observation: one is discrete in the space, the other has continuous space but only by having discrete objects to place in it... (is the issue just what is discrete?)

What is measured?

Remembering the Sinton taxonomy: time, space, "theme" (attribute)
Role of these Dimensions: One must be fixed, one "controlled", one measured.
Temporal examples:
Control: Time (hour) Measure: Attribute (water level) => strip chart (tides)
Control: Poland@date Measure: location => Historic Poland, etc.

Time typically fixed for maps (otherwise they are diagrams...)
Control: location (grid) Measure: Attribute => Raster
Control: attribute (exhaustive) Measure: location => Categorical coverage
Control: attribute (isolated) Measure: location => "Features" (objects)
Control: location (pre-existing objs) Measure: Attribute => Collection units

Representation:

at the base, vector data is a bunch of coordinates.
Coordinates: Pairs (triples) of measurements against orthogonal axes
Distances along these axes are theoretically infinite (between any two points)

Practical limitation: computers do not permit infinite precision. Some grid of possible points.

Integers: 16 bit -32767 to 32767 OR 32 bit: lots more resolution, roundoff of integer calculations
"Floating Point" - another form of Integers (NOT "Real") "single" has 6 decimal digits
lazy-programmers solution, sacrifice storage (exponent) for flexibility
Double-precision (64 bits usually) lots of digits, resolution of virus particles worldwide...
Rational arithmetic (two numbers, numerator/denominator) really slow

Relationship between resolution ("least count") and accuracy ("closeness to true value")
Vector resolution is different from raster resolution (smallest area)

Relationships

at the base, vector is a bunch of lines connecting the coordinates
Diagonals connect any legal point. Midpositions may not be on legal grid
Logical consistency imposes more relationships: (Topology!)
The basic spatial relationships: connectivity, boundary
points (nodes) bound lines lines (chains) bound areas areas (polygons ) bound volumes

Attributes! (don't forget your original reason for storing this stuff...)

Some basic data structures that don't go away:

Points: easy: tabular fields for coordinates
Images of linework or attribute (directly or indirectly)
Cartographic Spaghetti linework without labels or relationships (raw sources)
Shapefiles isolated objects; relationships have to be computed
Topological data structures (for some other time...)

Coordinate Geometry and Geometric Calculations

(from overheads, no links)

Objectives of this lecture
1 Review linkage of analytical geometry (trigonometry) for GIS
2 Examine simple calculations
3 Establish techniques for treating lines
4 Consider concerns of speed, efficiency, numerical stability

Basic Coordinate Geometry: A Thought-Experiment

1) How do you obtain coordinates if:

a) you are at a known point (such as a horizontal control marker) and observe an angle and a distance to some object.
b) you record the successive angles and distances of a "trip" (traverse) from a known point, out for some number of legs and then back to the point of beginning. [What do you expect about the "closure" of this traverse? What will you have to do if the "circle" is not 360.00000°?]
c) you record a lot of angles from two known points. What else do you need to know? [What would you do with angles recorded from many points (some known and some unknown?]

d) Would it be any better if you had distances instead of angles?

2) How would you determine bearing and distance from coordinates?

3) Take some sample parcel descriptions and figure out their coordinate representations.
(examples: 2209 Mass Ave, Lexington MA and 8687 Oddfellow Road Bainbridge Island WA)

Fundamental Geometric Algorithms

Wise starts off introducing equation for a line as:

the point (x,y) such that y = a + b*x.

This is a dubious choice. See snippet from Programmer's Geometry about ax +by +c =0. Even more useful if "normalized" so that a-squared plus b-squared =1. Then a and b are "direction cosines" of the angles that the normal to the line makes with the axes, c is - distance to the origin. Neat?

Distance:

equation is bone-crushingly simple (Pythagoraean Theorem), but still raises issues of calculation. For example, why take square root? (square your tolerance instead...)

Bearing of a line

sin (bearing) = x-difference / distance
cosine (bearing) = y-difference / distance
use inverse functions to solve for the angle

Distance from a point to a line:

raises issues of how to represent a line; useful for generalization
Handout from Programmer's Geometry shows how simple it can be...

(and the topic of the next lecture...)

Topological algorithms

(for treatment later) Point-in-polygon, segment intersection, polygon overlay

Roundoff and numerical stability:

the view of Prof. Kahan
Paradoxes in concepts of accuracy (numerical representations)
example of area of triangle calculations

Index from here: Next lecture | Schedule | Questions |

Version of 7 January 2003