Representing Unknown Values
- very often, when working with geospatial data, there are a lot of unknown
or uncertain values
- how to represent this, in data structures and user interfaces?
- kinds of uncertainty:
- existence: a feature may exist at this location
- precision: the location may be known only to +/- 20 meters, or
even have unknown precision
- values: a building may have known location but unknown height
or color, a road may have unknown width
- one common approach is to use sentinel values
- for example, -32768 can mean "this integer value is unknown" or (-1,
-1, -1) for an unknown color
- this can be problematic, e.g. if one person decides to declare "0" as
a standard sentinel for some value, but that is a meaningful value for another
person
- SEDRIS is a data representation standard
- it has a "Property_Characteristic" field type which can be used for
this purpose
- the standard includes Characteristics with UNKNOWN as an allowed sentinel
value for many fields
- there are no global sentinels, only field-specific values
- NASA/JPL PDS Standards Reference
- Chapter
17 (pdf) defines global sentinels for N/A, UNK, and NULL:
- "N/A (not applicable) indicates that the values within the domain of
this data element are not applicable in this instance. UNK (unknown)
indicates that the value is permanently not known. NULL indicates
that the value for this data element in this instance is temporarily unknown.
A value is applicable and is forthcoming."
- the values are globally and explicitly declared for all data types
User Interfaces
- what do when unknown values need to be shown to the user, as in a GUI dialog?
- one approach is just to show the raw sentinel value (for numbers, e.g. -1
for "Unknown")
- better would be to explicitly state "Unknown", or use a secondary visual
flag, but this could make the interface complicated or cluttered
- how about non-text values, like colors? perhaps a hatched pattern
or word "Unknown" could take the place of the color displayed to the user
High-Level Classification
- the problem of how to handle unknown values is part of the broader issue
of how to generally provide high-level classification of a set of data
- SEDRIS is perhaps the only large standard
that currently exists
- classification is handled by their
EDCS (Environmental Data Coding
Specification) specifically the DRM (Data Representation Model)
- higher-level descriptions exist as an alternative, side-by-side with
lower-level descriptions
- the alternative is implemented through the use of feature data classes,
specifically EDCS Classification Codes (ECCs)