API

There are two classes that are used to create an analyisis application:

Field Class

class okcompute.Field(key, description, validate_func=<function DUMMY_VALIDATE>)[source]

A pointer to data involved with analysis.

Fields are used by metrics to specify the inputs and outputs. They countain a key, description, and optional validation.

The key [‘a’, ‘b’, ‘c’] would refer to the value in root[‘a’][‘b’][‘c’] and is represented as “a/b/c”.

Attributes:

key (List[str]): The hierarchal location of a data field.

description (str): Description of the field.

validate_func (Callable[[Any], bool]): A function for validating input
data. “def DUMMY_VALIDATE(x): return True” by default.
get_by_path(root)[source]

Access the nested object in root at self.key

Args:
root (dict): The data_map field refers to.
Returns:
object: The value at key.self
Raises:
KeyError: If the path for self.key isn’t in root
static get_field_set(root, fields)[source]

Access a Pandas Dataframe like object with mapping to the set of fields

For example if:

root = {‘input’:{‘a’:pandas.Dataframe({‘foo’:[],’bar’:[],’bat’:[]})}}

fields = [Field(key=[‘input’, ‘a’, ‘foo’]), Field(key=[‘input’, ‘a’, ‘bat’])]

Then get_field_set would return root[‘input’][‘a’][[‘foo’, ‘bat’]]

Args:

root (dict): The data_map fields refers to.

fields (List[Field]): a list of fields that share a common path
except for the last string. These different strings refer to columns in the the object at the common path.
Returns:
object: The referenced columns of the object at the common path
Raises:

KeyError: If the path for a key in fields isn’t in root

TypeError: The object shared by the fields cannot take a
__getitem__ key that’s a list of strings
key_to_str()[source]

Return the str representation of the key.

Returns:
str: str representation of key ie. “a/b/c”
path_exists(root)[source]

Try to access a nested object in root by item sequence.

Args:
root (dict): The data_map field refers to.
Returns:
bool: True if value referenced by self.key exists, False otherwise.
set_by_path(root, value)[source]

Set a value in a nested object in root at self.key

Args:
root (dict): The data_map field refers to. value (any): value to set the item to
Raises:
KeyError: If the path up to self.key isn’t in root

App Class

class okcompute.App(name, description, version)[source]

An app for performing a set of analyisis

The metrics for analysis are specified by adding the metric decorator of an instance of this class to the anlysis functions.

Specifying these metrics builds a dependancy graph for the analysis to perform. An image of this graph can be saved with save_graph.

The analysis can be run on a data_map for the input and output with the run command. This command returns a report of what happened during the processing and the data_map is updated with results.

Attributes:
name (str): The name of the analysis set. description (str): A description for analysis set. version (str): A version string for analysis set.
metric(description, input_fields, output_fields)[source]

Decorator for adding metrics to app

The expected use is like:

@example_app.metric(
    input_fields=[FIELD_INT2],
    output_fields=[FIELD_OUT3],
    description='example node4'
)
def metrics(in_arg, valid_input):
    ...
    return val

The call signature of the function being added is inspected and used to inplicitly assess the desired behavior. Specifically the __name__ attribute and the parameters. This may make using lambas of additional decorators more complicated.

Here is what is implicitly checked:

  • The metric name - This is taken from the __name__ attribute. For a
    function this is it’s name
  • The input parameters - the input_fields specified are matched in order
    to the positional arguments of the function. An assertion is raised if the number of parameters doesn’t match the number of fields.
  • Parameter default values - if a parameter specifies a default value,
    the input field it corrasponds to is considered optional for this metric. This means the metric will still run even if the field is missing
  • Special valid_input parameter - if a parameter with the name
    valid_input is specified it is not mapped to the input fields. It instead is set to True if the input fields are valid, or False if they are not. Default parameters are considered valid. If a valid_input parameter exists it is expected that the metric will return some fallback output if valid_input is False.

Similar to the input_fields, the output_fields map to the return values of the function. If multiple outputs are specified they are expected as a tuple with the same length as output_fields.

Raises:

AssertionError: If the call signature of the function doesn’t match
the input_fields, or the field are in some way invalid, this function will raise an assertion when the module is loaded.
Args:
input_fields (List[Field]): The input fields that map to the
function parameters. As a special case, if one of the items in this list is a list itself, that set of fields is interpretted as columns for a Pandas Dataframe. See get_field_set() for more details.
output_fields (List[Field]): The function return values will be
written to these fields

description (str): A description for metric.

Returns:
callable: The decorated function
run(data_map, desired_output_fields=None, dry_run=False, skip_existing_results=False, save_graph_path=None, meta_args='')[source]

Run the app’s analysis using the data_map

Any exceptions raised during a metric’s function are surpressed. The tracebacks are logged in the report[‘run_results’: {‘result’: str}]. An assertion will be logged in this result if a metric doesn’t return the number of results corrasponding to its output_fields.

Args:
data_map (dict): A dict that holds the inputs and outputs for
metrics. The available inputs should be populated along with a dicts to countain internal and ouput fields
desired_output_fields (List[Field]): A subset of the desired output
fields. This will only run the metrics needed to produce these outputs. If this is None the metrics won’t be skipped based on outputs.
dry_run (bool): If this is true, don’t actually run the analysis.
Only produce a report checking the input for which metrics would be skipped.
skip_existing_results (bool): If all the outputs for a metric are
already preset in data_map, don’t rerun the metric.
save_graph_path (str): A path to save an image of the graph of
analysis that runs based on the input. No graph is made if this path is None.

meta_args (dict): Any additional values to add to the report.

Returns:
report (dict): A report of the analysis that was run. It countains

the following top level fields:

  • meta_data - a description of the analysis and total run time
  • existing_results_skipped - if skip_existing_results is True
    which metrics were skipped
  • unneeded_metrics - if desired_output_fields were specified
    which metrics were skipped
  • metrics_missing_input - which metrics expected input missing
    from data_map
  • run_results - elapsed time for metric and result (Success or
    Failure along with cause)