API¶
There are two classes that are used to create an analyisis application:
Field Class¶
-
class
okcompute.
Field
(key, description, validate_func=<function DUMMY_VALIDATE>)[source]¶ A pointer to data involved with analysis.
Fields are used by metrics to specify the inputs and outputs. They countain a key, description, and optional validation.
The key [‘a’, ‘b’, ‘c’] would refer to the value in root[‘a’][‘b’][‘c’] and is represented as “a/b/c”.
- Attributes:
key (List[str]): The hierarchal location of a data field.
description (str): Description of the field.
- validate_func (Callable[[Any], bool]): A function for validating input
- data. “def DUMMY_VALIDATE(x): return True” by default.
-
get_by_path
(root)[source]¶ Access the nested object in root at self.key
- Args:
- root (dict): The data_map field refers to.
- Returns:
- object: The value at key.self
- Raises:
- KeyError: If the path for self.key isn’t in root
-
static
get_field_set
(root, fields)[source]¶ Access a Pandas Dataframe like object with mapping to the set of fields
For example if:
root = {‘input’:{‘a’:pandas.Dataframe({‘foo’:[],’bar’:[],’bat’:[]})}}
fields = [Field(key=[‘input’, ‘a’, ‘foo’]), Field(key=[‘input’, ‘a’, ‘bat’])]
Then get_field_set would return root[‘input’][‘a’][[‘foo’, ‘bat’]]
- Args:
root (dict): The data_map fields refers to.
- fields (List[Field]): a list of fields that share a common path
- except for the last string. These different strings refer to columns in the the object at the common path.
- Returns:
- object: The referenced columns of the object at the common path
- Raises:
KeyError: If the path for a key in fields isn’t in root
- TypeError: The object shared by the fields cannot take a
- __getitem__ key that’s a list of strings
-
key_to_str
()[source]¶ Return the str representation of the key.
- Returns:
- str: str representation of key ie. “a/b/c”
App Class¶
-
class
okcompute.
App
(name, description, version)[source]¶ An app for performing a set of analyisis
The metrics for analysis are specified by adding the metric decorator of an instance of this class to the anlysis functions.
Specifying these metrics builds a dependancy graph for the analysis to perform. An image of this graph can be saved with save_graph.
The analysis can be run on a data_map for the input and output with the run command. This command returns a report of what happened during the processing and the data_map is updated with results.
- Attributes:
- name (str): The name of the analysis set. description (str): A description for analysis set. version (str): A version string for analysis set.
-
metric
(description, input_fields, output_fields)[source]¶ Decorator for adding metrics to app
The expected use is like:
@example_app.metric( input_fields=[FIELD_INT2], output_fields=[FIELD_OUT3], description='example node4' ) def metrics(in_arg, valid_input): ... return val
The call signature of the function being added is inspected and used to inplicitly assess the desired behavior. Specifically the __name__ attribute and the parameters. This may make using lambas of additional decorators more complicated.
Here is what is implicitly checked:
- The metric name - This is taken from the __name__ attribute. For a
- function this is it’s name
- The input parameters - the input_fields specified are matched in order
- to the positional arguments of the function. An assertion is raised if the number of parameters doesn’t match the number of fields.
- Parameter default values - if a parameter specifies a default value,
- the input field it corrasponds to is considered optional for this metric. This means the metric will still run even if the field is missing
- Special valid_input parameter - if a parameter with the name
- valid_input is specified it is not mapped to the input fields. It instead is set to True if the input fields are valid, or False if they are not. Default parameters are considered valid. If a valid_input parameter exists it is expected that the metric will return some fallback output if valid_input is False.
Similar to the input_fields, the output_fields map to the return values of the function. If multiple outputs are specified they are expected as a tuple with the same length as output_fields.
Raises:
- AssertionError: If the call signature of the function doesn’t match
- the input_fields, or the field are in some way invalid, this function will raise an assertion when the module is loaded.
- Args:
- input_fields (List[Field]): The input fields that map to the
- function parameters. As a special case, if one of the items in
this list is a list itself, that set of fields is interpretted
as columns for a Pandas Dataframe. See
get_field_set()
for more details. - output_fields (List[Field]): The function return values will be
- written to these fields
description (str): A description for metric.
- Returns:
- callable: The decorated function
-
run
(data_map, desired_output_fields=None, dry_run=False, skip_existing_results=False, save_graph_path=None, meta_args='')[source]¶ Run the app’s analysis using the data_map
Any exceptions raised during a metric’s function are surpressed. The tracebacks are logged in the report[‘run_results’: {‘result’: str}]. An assertion will be logged in this result if a metric doesn’t return the number of results corrasponding to its output_fields.
- Args:
- data_map (dict): A dict that holds the inputs and outputs for
- metrics. The available inputs should be populated along with a dicts to countain internal and ouput fields
- desired_output_fields (List[Field]): A subset of the desired output
- fields. This will only run the metrics needed to produce these outputs. If this is None the metrics won’t be skipped based on outputs.
- dry_run (bool): If this is true, don’t actually run the analysis.
- Only produce a report checking the input for which metrics would be skipped.
- skip_existing_results (bool): If all the outputs for a metric are
- already preset in data_map, don’t rerun the metric.
- save_graph_path (str): A path to save an image of the graph of
- analysis that runs based on the input. No graph is made if this path is None.
meta_args (dict): Any additional values to add to the report.
- Returns:
- report (dict): A report of the analysis that was run. It countains
the following top level fields:
- meta_data - a description of the analysis and total run time
- existing_results_skipped - if skip_existing_results is True
- which metrics were skipped
- unneeded_metrics - if desired_output_fields were specified
- which metrics were skipped
- metrics_missing_input - which metrics expected input missing
- from data_map
- run_results - elapsed time for metric and result (Success or
- Failure along with cause)