API¶

There are two classes that are used to create an analyisis application:

Field Class¶

class okcompute.Field(key, description, validate_func=<function DUMMY_VALIDATE>)[source]¶

A pointer to data involved with analysis.

Fields are used by metrics to specify the inputs and outputs. They countain a key, description, and optional validation.

The key [‘a’, ‘b’, ‘c’] would refer to the value in root[‘a’][‘b’][‘c’] and is represented as “a/b/c”.

Attributes:

key (List[str]): The hierarchal location of a data field.

description (str): Description of the field.

validate_func (Callable[[Any], bool]): A function for validating input: data. “def DUMMY_VALIDATE(x): return True” by default.

get_by_path(root)[source]¶

Access the nested object in root at self.key

Args:: root (dict): The data_map field refers to.
Returns:: object: The value at key.self
Raises:: KeyError: If the path for self.key isn’t in root

static get_field_set(root, fields)[source]¶

Access a Pandas Dataframe like object with mapping to the set of fields

For example if:

root = {‘input’:{‘a’:pandas.Dataframe({‘foo’:[],’bar’:[],’bat’:[]})}}

fields = [Field(key=[‘input’, ‘a’, ‘foo’]), Field(key=[‘input’, ‘a’, ‘bat’])]

Then get_field_set would return root[‘input’][‘a’][[‘foo’, ‘bat’]]

Args:

root (dict): The data_map fields refers to.

fields (List[Field]): a list of fields that share a common path: except for the last string. These different strings refer to columns in the the object at the common path.

Returns:

object: The referenced columns of the object at the common path

Raises:

KeyError: If the path for a key in fields isn’t in root

TypeError: The object shared by the fields cannot take a: __getitem__ key that’s a list of strings

key_to_str()[source]¶

Return the str representation of the key.

Returns:: str: str representation of key ie. “a/b/c”

path_exists(root)[source]¶

Try to access a nested object in root by item sequence.

Args:: root (dict): The data_map field refers to.
Returns:: bool: True if value referenced by self.key exists, False otherwise.

set_by_path(root, value)[source]¶

Set a value in a nested object in root at self.key

Args:: root (dict): The data_map field refers to. value (any): value to set the item to
Raises:: KeyError: If the path up to self.key isn’t in root

App Class¶

class okcompute.App(name, description, version)[source]¶

An app for performing a set of analyisis

The metrics for analysis are specified by adding the metric decorator of an instance of this class to the anlysis functions.

Specifying these metrics builds a dependancy graph for the analysis to perform. An image of this graph can be saved with save_graph.

The analysis can be run on a data_map for the input and output with the run command. This command returns a report of what happened during the processing and the data_map is updated with results.

Attributes:: name (str): The name of the analysis set. description (str): A description for analysis set. version (str): A version string for analysis set.

metric(description, input_fields, output_fields)[source]¶

Decorator for adding metrics to app

The expected use is like:

@example_app.metric(
    input_fields=[FIELD_INT2],
    output_fields=[FIELD_OUT3],
    description='example node4'
)
def metrics(in_arg, valid_input):
    ...
    return val

The call signature of the function being added is inspected and used to inplicitly assess the desired behavior. Specifically the __name__ attribute and the parameters. This may make using lambas of additional decorators more complicated.

Here is what is implicitly checked:

The metric name - This is taken from the __name__ attribute. For a

function this is it’s name
The input parameters - the input_fields specified are matched in order

to the positional arguments of the function. An assertion is raised if the number of parameters doesn’t match the number of fields.
Parameter default values - if a parameter specifies a default value,

the input field it corrasponds to is considered optional for this metric. This means the metric will still run even if the field is missing
Special valid_input parameter - if a parameter with the name

valid_input is specified it is not mapped to the input fields. It instead is set to True if the input fields are valid, or False if they are not. Default parameters are considered valid. If a valid_input parameter exists it is expected that the metric will return some fallback output if valid_input is False.

Similar to the input_fields, the output_fields map to the return values of the function. If multiple outputs are specified they are expected as a tuple with the same length as output_fields.

Raises:

AssertionError: If the call signature of the function doesn’t match

the input_fields, or the field are in some way invalid, this function will raise an assertion when the module is loaded.

Args:

input_fields (List[Field]): The input fields that map to the: function parameters. As a special case, if one of the items in this list is a list itself, that set of fields is interpretted as columns for a Pandas Dataframe. See get_field_set() for more details.
output_fields (List[Field]): The function return values will be: written to these fields

description (str): A description for metric.

Returns:

callable: The decorated function

run(data_map, desired_output_fields=None, dry_run=False, skip_existing_results=False, save_graph_path=None, meta_args='')[source]¶

Run the app’s analysis using the data_map

Any exceptions raised during a metric’s function are surpressed. The tracebacks are logged in the report[‘run_results’: {‘result’: str}]. An assertion will be logged in this result if a metric doesn’t return the number of results corrasponding to its output_fields.

Args:

data_map (dict): A dict that holds the inputs and outputs for: metrics. The available inputs should be populated along with a dicts to countain internal and ouput fields
desired_output_fields (List[Field]): A subset of the desired output: fields. This will only run the metrics needed to produce these outputs. If this is None the metrics won’t be skipped based on outputs.
dry_run (bool): If this is true, don’t actually run the analysis.: Only produce a report checking the input for which metrics would be skipped.
skip_existing_results (bool): If all the outputs for a metric are: already preset in data_map, don’t rerun the metric.
save_graph_path (str): A path to save an image of the graph of: analysis that runs based on the input. No graph is made if this path is None.

meta_args (dict): Any additional values to add to the report.

Returns:

report (dict): A report of the analysis that was run. It countains

the following top level fields:

meta_data - a description of the analysis and total run time
existing_results_skipped - if skip_existing_results is True

which metrics were skipped
unneeded_metrics - if desired_output_fields were specified

which metrics were skipped
metrics_missing_input - which metrics expected input missing

from data_map
run_results - elapsed time for metric and result (Success or

Failure along with cause)

API¶

Field Class¶

App Class¶

OKCompute

Navigation

Related Topics