Augmented Types in SADL

Last revised 4/27/2021. Contact us.

Introduction

Most programming languages have some concept of built-in types, e.g., integer, float, string, Boolean, etc. These types may be used, for example, to specify the type of a variable, the signature of a method call, the type of variables appearing in an equation, or the type of the value or values returned by a method call. In object-oriented languages, classes may be defined that are aligned with concepts in the domain and these classes may also be used as types.

However, there are important differences between the expressivity of most object-oriented languages and the expressivity of a graph-based ontology language such as OWL. Not least among these is that in most object-oriented languages, properties are represented as fields in a class and have no independent existence. By contrast, properties in graph ontology languages are first-class citizens and can have restrictions on value type, cardinality, etc., based on the class of the thing described by the property. This means that more than one class can be in the domain of the same property, and that a property may have restrictions on cardinality, type of value, etc., which are different for different subject classes, but the property, in each case, is still identifiable as the same property.

In SADL, the types of variables in equation signatures and return types can be either primitive data types or domain-specific classes. (See Equations and External Equations.) However, if one is to represent in a richer sense the knowledge that is captured in a set of equations, one must do more than simply identify the type of an equation argument or returned value. One must capture how inputs and outputs are related to each other, in domain terms. Graph patterns can be associated with an equation as a means of making these relationships explicit. For any single equation, the extent of the graph pattern needed is the domain sub-graph that connects all inputs and outputs together. It may also be important to capture constraints and assumptions on the equation inputs that provide information on the limits of the equation's applicability. The capture of the relationship between inputs and outputs and the representation of constraints and assumptions is the purpose of augmented types in SADL.

Examples of Augmented Types in Equations

As an example, consider the equation for the speed of sound in a gas (see https://www.grc.nasa.gov/www/k-12/VirtualAero/BottleRocket/airplane/sound.html).

a = sqrt( γ R T)

where

γ (gamma) is the ratio of specific heats (1.4 for air at standard temperature and pressure)
R is the gas constant (2.86 m2/s2/Ko for air)
T is the absolute temperature in degrees Kelvin (Ko)
a is the speed of sound in the gas

At the variable level, each of the right-hand-side variables are floating point numbers, as is the left-hand-side output variable. In SADL syntax, the equation can be written as:

Equation SOS (float gam, float R, float T) returns float: sqrt(gam * R * T).

However, the amount of knowledge captured in this statement is woefully inadequate to ensure the equation's proper application. Some additional knowledge that would be useful includes:

  1. The temperature T must be absolute.
  2. The units of gamma (gam), R, and T must be compatible and will dictate the units of the computed value of a.
  3. The ratio gamma (gam), the gas constant R, and the temperature T must all be properties of the same gas, the gas through which the sound is traveling.
  4. The speed of sound a is the speed of sound in the gas described by the input values.

None of this is explicit in the equation as written.

Using SADL grammar, we can increase the knowledge content of this equation. However, to do so we need to reference a semantic model of the domain. The simple one below will suffice for this example and the next. (For more information on UnittedQuantity, see UnittedQuantity  and The SADL Implicit Model.)

PhysicalThing is a class,
   described by temperature with values of type Temperature.
{Substance, PhysicalObject} are types of PhysicalThing.
Gas
is a type of Substance, described by gamma with values of type float,
   described
by gasConstant with values of type GasConstant,
   described
by sos (alias "speed of sound in the gas") with values of type Speed.
Air
is a type of Gas.

Movement is a class,
  described
by objectMoving with values of type PhysicalObject,
  described
by medium with values of type Substance,
  described
by speed with values of type Speed.

mach describes PhysicalObject with values of type float.

Temperature is a type of UnittedQuantity.
GasConstant
is a type of UnittedQuantity.
Speed
is a type of UnittedQuantity.

Now we can express the equation with greater clarity.

Equation SOSAug (float gam (gamma of a Gas),
                
float
R (gasConstant of the Gas {"m2/s2/Kelvin"}),
                  
float T (temperature of the Gas {Kelvin}))
     returns float (sos of the Gas {"m/s"}):
     sqrt(gam * R * T).

Note the use of indefinite and definite articles, as normally used in English grammar and used in SADL. Hence the first argument's float value must be the gamma property of some Gas. The second argument's float value must be the gasConstant property of the same instance of Gas referred to by the first argument. Furthermore, the gasConstant property value must have the units "m2/s2/Kelvin" (meter squared per second squared per degree Kelvin). Likewise, the third argument's float value must be the temperature property value, in degrees Kelvin, of that same instance of Gas.

The additional knowledge captured is not only useful in properly applying this equation, but can be used to appropriately combine this equation with other equations to create more complex computational models. For example, below is the equation for Mach number (see https://www.grc.nasa.gov/www/k-12/airplane/isentrop.html).

    M = v/a

where

    v is the object speed
    a is the speed of sound
    M is the Mach number

In SADL syntax, the equation can be written as follows. (Note that the ^ in front of "^a" is necessary to indicate that "a" is a variable name, not the SADL grammar keyword "a".)

Equation MachNumber(float v, float ^a) returns float: v/^a.

As in the case above, this equation does not capture essential knowledge required to properly apply the equation. Adding augmented type information yields one possible form of the augmented equation.

Equation MachNumberAug(float v (speed of a Movement with objectMoving a PhysicalObject,
                                                    with
medium some Air {"m/s"}),
                       float
^a (sos of the Air {"m/s"}))
      returns
float (mach of the PhysicalObject):
      v
/^a.

In the semantic model above, we created the mediating class Movement to bring the moving object and the medium through which it moves into relationship. The speed property of such a Movement is the first argument of this equation and the sos (speed of sound) in the medium is the second argument. We have chosen to represent the Mach number as a property of the PhysicalObject only, but one could reasonably have made mach a property with domain Movement instead of PhysicalObject since it only has that value in the context of the medium of the movement.

The augmented type information for the first SADL equation above for speed of sound and the second SADL equation for Mach number of an object moving through air not only captures information about the conditions of applicability of the equations, it also provides enough information to allow an agent to reason that the output of the first equation can be input as the second argument to the second equation. Since Air is a subclass of Gas, one can reason that the first equation is applicable to Air. In fact, the work on augmented types began with a DARPA project to build models that could be intelligently assembled by an artificial intelligence to create larger, more complex models.

The equation MachNumberAug above specifies that the units of the speed of the moving object and the speed of sound in air must both be in "m/s". However, they can actually be in any valid unit of speed as long as they are the same. The augmented type information of the equation can be modified to express this more general constraint.

Equation MachNumberAug2(float v (speed of a Movement with objectMoving a PhysicalObject,
                                                     with
medium some Air),
                        float
^a (sos of the Air and
                                  unit
of sos of the Air = unit of speed of the Movement))
      returns
float (mach of the PhysicalObject):
      v
/^a.

The class UnittedQuantity is in the domain of the property unit, see UnittedQuantity.


One can also express assumptions and constraints in augmented types. As another example, consider these three equations for the static temperature of air as a function of altitude. (See https://www.grc.nasa.gov/www/k-12/airplane/atmos.html.)

  1. Equation troposphereTemperature(decimal alt (^value of altitude of some Air and alt <= 36152 {ft}))
        returns
    decimal (^value of temperature of the Air {F}): return 59 - .00356 * alt.

  2. Equation lowerStratosphereTemperature(decimal alt (^value of altitude of some Air and alt > 36152 and alt <= 82345 {ft}))
        returns
    decimal (^value of temperature of the Air {F}) : return -70.

  3. Equation upperStratosphereTemperature(decimal alt (^value of altitude of some Air and alt > 82345 {ft}))
        returns
    decimal (^value of temperature of the Air {F}) : return -205.05+.00164 * alt.

These equations illustrate functional constraints, namely the range of altitude values for which each equation is valid, as well as the relationships of the input and output. Note the use of property chains. The class UnittedQuantity is in the domain of the property value (escaped with ^ in the model because it is a keyword in the grammar). As both altitude and temperature have ranges which are of type UnittedQuantity, the property chains value of altitude of some Air and value of temperature of the Air tie the input and output together through the air at that altitude and temperature.

Extending Augmented Types to Tabular Data

It is possible to apply the same approach used for equation arguments and returned values to add augmented type information to tabular data. Tabular data might be used, for example, when representing experimental observations. Knowledge captured about how the data in each column fit into a semantic model of the domain and how the data in different columns are related  to each other in domain terms, giving the tabular data context and allowing it to be more useful in automated reasoning. One use would be to validate computation models using the observations. The additional information is captured in a declaration using the table keyword in the SADL grammar. Here is an example tabular data table declaration for data in the hypersonics domain. In this case the actual data is located outside of the semantic model at the location indicated by "located at ...".

Data1 is a table
   [
double alt (alias "Alt") (altitude of a PhysicalObject {"ft"}),
    double
u0 (velocity of the PhysicalObject and the PhysicalObject movesIn some Air {"mph"}),
    double
tt (staticTemperature of the Air {"R"})]
  with
data located at "http://datasource/statictemperatureobservations/data1".

It is also possible to include the actual data in the SADL model, as shown in this example.

Data2 is a table
   [
double alt (alias "Alt") (altitude of a PhysicalObject {"ft"}),
    double
u0 (velocity of the PhysicalObject and the PhysicalObject movesIn some Air {"mph"}),
    double
tt (staticTemperature of the Air {"R"})]
  with
data
  {[2000
, 600, 576],
   [4000
, 700, 592],
   [6000, 800, 612]
  }.

Representing Augmented Type Information in OWL

In order to represent augmented type information in OWL a meta-model is needed. The SADL implicit model (SadlImplicitModel.sadl) provides such a meta-model. The common class used in both equation augmented types and data table augmented types is the DataDescriptor. For instances of the Equation class, the arguments value is a DataDescriptor List, thus maintaining the order of the arguments. The returnedTypes value is likewise a DataDescritptor List. In instances of DataTable, the columnDescriptors value is a DataDescriptor List.

^Equation is a class,
   described
by expression with values of type Script.
arguments
describes ^Equation with a single value of type DataDescriptor List.
returnTypes
describes ^Equation with a single value of type DataDescriptor List.

DataTable is a class,
   described
by columnDescriptors with a single value of type DataDescriptor List,
   described
by dataContent with a single value of type DataTableRow List,
   described
by dataLocation with a single value of type anyURI.

The DataDescriptor class is defined as follows.

DataDescriptor is a class,
   described
by localDescriptorName (note "If this DataDescriptor is associated with a named parameter, this is the name") with a single value of type string,
   described
by dataType (note "the simple data type, e.g., float") with a single value of type anyURI,
   described
by specifiedUnits (note "the array of possible units") with a single value of type string List,
   described
by augmentedType (note "ties the DataDescriptor to the semantic domain model") with values of type AugmentedType,
   described
by descriptorVariable (note "This identifies the GPVariable, if any, in the AugmentedType which is associated with this DataDescriptor").
dataType
of DataDescriptor has at most 1 value.
descriptorVariable
of DataDescriptor has at most 1 value.

The descriptorVariable property has a value only when there is a name associated with the DataDescriptor, as will be the case for an Equation argument or a DataTable column. An Equation returnTypes DataDescriptor will not have a value for descriptorVariable. The value of localDescriptorName will be the argument or column name as given in the model while the descriptorVariable will be a system generated unique URI identifying the argument or column in the larger model context. (See below for more details on variable naming.)

The semantic meaning of an argument, returned value, or column is captured in the value of the augmentedType property, whose range is AugmentedType. The semantic model for AugmentedType is as follows.

AugmentedType is a class.
SemanticType (note "allows direct specification of the semantic type of an argument") is a type of AugmentedType,
   described
by semType with a single value of type class.
GraphPattern
is a class.
{
TriplePattern, FunctionPattern} are types of GraphPattern.
gpSubject
describes TriplePattern.
gpPredicate
describes TriplePattern.
gpObject
describes TriplePattern.
builtin
describes FunctionPattern with a single value of type ^Equation.
GPAtom
is a class.
{
GPVariable, GPLiteralValue, GPResource} are types of GPAtom.
gpVariableName
describes GPVariable with a single value of type string.
gpLiteralValue
describes GPLiteralValue with values of type data.
argValues
(note "values of arguments to the built-in") describes FunctionPattern with a single value of type GPAtom List.
SemanticConstraint
(note "used to identify necessary patterns in semantic domain terms") is a type of AugmentedType,
   described
by constraints with a single value of type GraphPattern List.

The TriplePattern class, with its properties gpSubject, gpPredicate, and gpObject, provides a way to represent graph patterns in OWL. The FunctionPattern represents a built-in function. The representation of triple patterns and function patterns in OWL or RDF is not a new concept. Rules and queries both have triple patterns as essential parts with variables used to connect triple patterns together to form complex graph patterns. The Semantic Web Rule Language (SWRL) represents rules in OWL. [1] The SPIN language supported triple patterns as well as functions. [2] The newer Shapes Constraint Language (SHACL), which largely replaces SPIN, also has some capability to capture triple patterns. [3]

Variable Naming

The variables that connect triple patterns and function patterns together are a necessary part of the capture of semantic constraints. Sometimes these variables will reference the variable which is an argument of the equation or the column title in a table. Sometimes they will be created from class references with indefinite articles, e.g. “a PhysicalObject”. In some cases they will be created by expanding nested expressions, as is the case when a property of subject is nested inside an equation call as an argument. In any case, the OWL representation must take care not to create an Individual in the OWL graph with the name of the variable as the localname. The reason is because of scoping. Scoping in the Xtext implementation of the SADL grammar recognizes that alt in three static air temperature equationsabove is not the same alt but is a different alt in each equation. The variable for any equation is scoped only within that equation’s arguments, return values, and constraints. Another equation in the same namespace might use the same argument name but it might have a different type and different semantic constraints.

OWL offers no such equation-level scoping unless each equation were in a separate namespace. Therefore, we must create a variable for alt in each equation which is different from the variable for alt in any other equation in the namespace. While some triple pattern representations in OWL use blank nodes for these equation-scoped variables, with a property capturing their name, we take the approach of creating unique variable names for each variable reference within the namespace. This facilitates the use of the variable in multiple triples and/or function patterns. Regardless of whether the variable has a user-defined name, e.g., is an argument to the equation signature, or has a name generated by the translation, the variable is given that name as the value of the property “localDescriptorName”.

While it would be possible to create sequential unique variable names in a namespace with a counter as a way to obtain uniqueness, this has the disadvantage that the content of the OWL semantic constraints would depend upon other equations in the namespace and upon their order, which means that test cases would be affected by any change in the input SADL. Another approach, which eliminates this problem, is to pre-pend the equation name to the variable name and start the variable index counter anew for each equation. This eliminates the dependency of the output on the number and order of equations in the input SADL file.

References

[1]

H. e. al, "SWRL: A Semantic Web Rule Language Combining OWL and RuleML," 21 May 2004. [Online]. Available: https://www.w3.org/Submission/SWRL/. [Accessed 18 July 2019].

[2]

H. Knublauch, "SPIN - SPARQL Syntax," 12 September 2013. [Online]. Available: https://spinrdf.org/sp.html. [Accessed 18 July 2019].

[3]

H. Knublauch and D. Kontokostas, "Shapes Constraint Language (SHACL," 20 July 2017. [Online]. Available: https://www.w3.org/TR/shacl/. [Accessed 18 July 2019].