binx package

Submodules

binx.adapter module

The adapter module helps the user mutate one collection into another. This works similar to a calc class in calc_factory. AbstractAdapter’s call takes a collection’s data attribute as the first argument. An adapt() method must be overridden by the user which must return an instance of AdapterOutputContainer. This is enforced by returning the helper method

Once the class is declared the user register’s the adapter using the register static method

class binx.adapter.AbstractAdapter[source]

Bases: abc.ABC

Concrete Adapters subclass this class and override its adapt method with an implementation. Other methods may be added as helper classes

adapt(collection, **context)[source]

The user must overrides this method to do the data cleaning. Any additional methods needed for clean should be considered private to the Adapter subclass. Must call render_return

from_collection_class = None
is_registered = False
render_return(data, **context)[source]

A helper method renders the data response to the target_collection_class and passes context data. Must return the data in an instance of AdapterOutputContainer

target_collection_class = None
class binx.adapter.AdapterOutputContainer(collection, **context)[source]

Bases: object

A generic container class for moving data out of adapters. It holds the target output collection along with any context data that might need to be passed on to the caller or another adapter. Essentially ‘side effects’ from the adaptation that might be needed further along in the adapter chain.

NOTE that the context in a container instance only relates to its immediate adapter call. It does contain any of the surrounding context. This gets accumulated in BaseCollection._resolve_adapter_chain

This is used internally in Adapter.__call__

context
class binx.adapter.PluggableAdapter[source]

Bases: binx.adapter.AbstractAdapter

creates a pluggable interface for Adapters. A user should subclass this class and provide a calc object that

adapt(collection, **context)[source]

Calls the adaptation function set on the calc field.

calc = None
binx.adapter.check_adapter_call(method)[source]

a helper decorater for the __call__ method that does some type checking

binx.adapter.register_adapter(adapter_class)[source]

Registers the adapter class in the graph chain by setting its to and from classes

binx.collection module

Abstract base classes for the system. The AHUOb

class binx.collection.AbstractCollection[source]

Bases: object

Defines an interface for Collection objects. This includes a valid marshmallow serializer class, a data list object iterablem, load_data method with validation. Collections are also registered so this AbstractCollection uses AbstractCollectionMeta as a metaclass

data

returns an object-representation of the metadata using the serializer

get_fully_qualified_class_path()[source]

reaches into the registry and gets the fully qualified class path

internal_class

returns an ma serializer. Used for validation and instantiation NOTE possibly change to class method

load_data(object)[source]

uses a marshmallow serializer to validate and load the data into an object-record representation

serializer_class

returns an ma serializer. Used for validation and instantiation

to_dataframe()[source]

returns a dataframe representation of the object. This wraps the data property in a pd.DataFrame

to_json()[source]

returns a json string representation of the data using the serializer

class binx.collection.AbstractCollectionBuilder[source]

Bases: abc.ABC

An interface for the CollectionBuilder. A build method takes a subclass of BaseSerializer and creates a Collection class dynamically. Its use is optional but is designed to cut down on class declarations if the user is making many generic Collection implementations.

build(serializer)[source]

builds a collection object

class binx.collection.BaseCollection(data=None, **ma_kwargs)[source]

Bases: binx.collection.AbstractCollection

Used to implement many of the default AbstractCollection methods Subclasses will mostly just need to define a custom Serializer and InternalObject pair

Parameters:data – the data being passed into the serializer, could be a dataframe or list of records. If None
classmethod adapt(input_collection, accumulate=False, **adapter_context)[source]

Attempts to adapt the input collection instance into a collection of this type by resolving the adapter chain for the input collection. Any kwargs passed in are handed over to the resolver. colla = CollectionA() colla.load_data(some_data) collb, context = CollectionB.adapt(colla, some_var=42, some_other_var=66)

This method returns a new instance of the adapted class (the caller)

collection_id
data

returns an object-representation of the metadata using the serializer

classmethod get_fully_qualified_class_path()[source]

This returns the fully qualified class name for this class. This can be used for collection_registry lookup

classmethod get_registry_entry()[source]

This returns the complete registry entry for this class

internal

returns a class of the internal object

internal_class

alias of InternalObject

load_data(records, raise_on_empty=False)[source]

default implementation. Defaults to handling lists of python-dicts (records). #TODO – create a drop_duplicates option and use pandas to drop the dupes

serializer

returns an ma serializer. Used for validation and instantiation

serializer_class

alias of BaseSerializer

to_dataframe()[source]

returns a dataframe representation of the object. This wraps the data property in a pd.DataFrame converts any columns that can be converted to datetime

to_json()[source]

returns a json string representation of the data using the serializer

class binx.collection.BaseSerializer(*args, **kwargs)[source]

Bases: marshmallow.schema.Schema

The BaseSerializer overrides Schema to include a internal to dump associated InternalObjects. These are instantiated with the serializer and used for loading and validating data. It also provides a mapping of numpy dtypes to a select amount of marshmallow field name which helps optimize memory in the to_dataframe object

get_numpy_fields()[source]

returns a dictionary of column names and numpy dtypes based on the ma_np_map dictionary. Collections will use this to create more mem-optimized dataframes

load_object(data, **kwargs)[source]

loads and validates an internal class object

numpy_map = {<class 'marshmallow.fields.Integer'>: dtype('int64'), <class 'marshmallow.fields.Float'>: dtype('float64'), <class 'marshmallow.fields.String'>: dtype('<U'), <class 'marshmallow.fields.Date'>: dtype('<M8[ns]'), <class 'marshmallow.fields.DateTime'>: dtype('<M8[ns]'), <class 'marshmallow.fields.List'>: dtype('O'), <class 'marshmallow.fields.Boolean'>: dtype('bool'), <class 'marshmallow.fields.Dict'>: dtype('O'), <class 'marshmallow.fields.Nested'>: dtype('O')}
opts = <marshmallow.schema.SchemaOpts object>
registered_colls = {}
class binx.collection.CollectionBuilder(name=None, unique_fields=None)[source]

Bases: binx.collection.AbstractCollectionBuilder

A factory class that contructs Collection objects dynamically, providing a default namespace for binx.registry and the adapter chain.

build(serializer_class, name=None, internal_only=False)[source]

dynamically creates and returns a Collection class given a serializer and identifier. If internal_only is set to True then this will only return the internal. This is useful if you are using a declarative approach to defining the collections and want to add or override some of the base behavior

class binx.collection.CollectionMeta[source]

Bases: type

class binx.collection.InternalObject(*args, **kwargs)[source]

Bases: object

a namespace class for instance checking for an internally used model object It is otherwise a normal python object. _Internals are used as medium for serialization and deserialization and their declarations bound with Collections and enforced by Serializers. It can be inherited from or used as a Mixin.

is_binx_internal = True
registered_colls = {}

binx.exceptions module

Custom exceptions for binx

exception binx.exceptions.AdapterChainError[source]

Bases: binx.exceptions.BinxError

thrown if a input collection cannot be found on the adapter chain for a Collection

exception binx.exceptions.AdapterCollectionResultError[source]

Bases: binx.exceptions.BinxError

thrown if a collection load fails while attempting to adapt

exception binx.exceptions.AdapterFunctionError[source]

Bases: binx.exceptions.BinxError, ValueError

thrown if a 2-tuple is not returned from a pluggable adapter function.

exception binx.exceptions.BinxError[source]

Bases: Exception

A base exception for the library

exception binx.exceptions.CollectionLoadError[source]

Bases: binx.exceptions.BinxError

thrown if a Collection fails to load its Internal Object Collection this could be due to a validation error or some other issue

exception binx.exceptions.CollectionValidationError(message: Union[str, List[T], Dict[KT, VT]], field_name: str = '_schema', data: Mapping[str, Any] = None, valid_data: Union[List[Dict[str, Any]], Dict[str, Any]] = None, **kwargs)[source]

Bases: marshmallow.exceptions.ValidationError, binx.exceptions.BinxError

subclass of a marshmallow validation error

exception binx.exceptions.FactoryCreateValidationError[source]

Bases: binx.exceptions.BinxError

wraps a marshmallow validation error in the create method of the factory

exception binx.exceptions.FactoryProcessorFailureError[source]

Bases: binx.exceptions.BinxError

raised if the _process method of a Factory fails to produce any results

exception binx.exceptions.InternalNotDefinedError[source]

Bases: binx.exceptions.BinxError

used for development - thrown if an Internal class is improperly declared on a Collection

exception binx.exceptions.RegistryError[source]

Bases: binx.exceptions.BinxError, KeyError

raised if a classname already exists in the collection registry

binx.registry module

A private registry for the collection objects. It is mainly used to register adaption classes on each collection object for data cleaning/processing. The classes are created by the user and registered at runtime.

binx.registry.adapter_path(from_class, end_class)[source]

traverses the registry and builds a class path of adapters to a target using by looking at each nodes ‘adaptable_from’ set. It will traverse the graph until all possibilities are exhausted. If it finds a matching adaptable, it returns the path of adapter objects that are needed to adapt the schema. If no path is found it returns an empty list

binx.registry.get_class_from_collection_registry(classname)[source]

returns the full tuple given the fully qualified classname

binx.registry.register_adaptable_collection(classname, coll)[source]

appends an adaptable collection to a classes list of adaptable collections

binx.registry.register_adapter_to_collection(classname, adapter)[source]

appends an adapter to the klass object

binx.registry.register_collection(cls)[source]

registers a new collection class.

binx.utils module

general purpose functionality. Classes are loosely classified by method type

class binx.utils.DataFrameDtypeConversion[source]

Bases: object

date_to_string(col_mapping, df)[source]

converts columns of pd Timestamps (np.datetime64) to strings

df_nan_to_none(df)[source]

converts a dfs nan values to none

df_none_to_nan(df)[source]

converts a df none values to nan if needed

class binx.utils.ObjUtils[source]

Bases: object

get_fully_qualified_path(obj)[source]

returns the fully qualified path of the class that defines this instance

class binx.utils.RecordUtils[source]

Bases: object

columns_to_records(column_dict)[source]

convert column_dict format to record format

date_to_string(col_mapping, records)[source]

converts datetime or date objects to strings

records_to_columns(records)[source]

convert record format to column format

replace_nan_with_none(records)[source]

checks a flat list of dicts for np.nan and replaces with None used for serialization of some result records. This is because marshmallow can not serialize and de-serialize in to NaN #NOTE this is a bit slow, should find a better way to make this conversion

binx.utils.bfs_shortest_path(graph, start, end)[source]

a generic bfs search algo

Module contents

Top-level package for binx.

class binx.BaseCollection(data=None, **ma_kwargs)[source]

Bases: binx.collection.AbstractCollection

Used to implement many of the default AbstractCollection methods Subclasses will mostly just need to define a custom Serializer and InternalObject pair

Parameters:data – the data being passed into the serializer, could be a dataframe or list of records. If None
classmethod adapt(input_collection, accumulate=False, **adapter_context)[source]

Attempts to adapt the input collection instance into a collection of this type by resolving the adapter chain for the input collection. Any kwargs passed in are handed over to the resolver. colla = CollectionA() colla.load_data(some_data) collb, context = CollectionB.adapt(colla, some_var=42, some_other_var=66)

This method returns a new instance of the adapted class (the caller)

collection_id
data

returns an object-representation of the metadata using the serializer

classmethod get_fully_qualified_class_path()[source]

This returns the fully qualified class name for this class. This can be used for collection_registry lookup

classmethod get_registry_entry()[source]

This returns the complete registry entry for this class

internal

returns a class of the internal object

internal_class

alias of InternalObject

load_data(records, raise_on_empty=False)[source]

default implementation. Defaults to handling lists of python-dicts (records). #TODO – create a drop_duplicates option and use pandas to drop the dupes

serializer

returns an ma serializer. Used for validation and instantiation

serializer_class

alias of BaseSerializer

to_dataframe()[source]

returns a dataframe representation of the object. This wraps the data property in a pd.DataFrame converts any columns that can be converted to datetime

to_json()[source]

returns a json string representation of the data using the serializer

class binx.InternalObject(*args, **kwargs)[source]

Bases: object

a namespace class for instance checking for an internally used model object It is otherwise a normal python object. _Internals are used as medium for serialization and deserialization and their declarations bound with Collections and enforced by Serializers. It can be inherited from or used as a Mixin.

is_binx_internal = True
registered_colls = {}
class binx.BaseSerializer(*args, **kwargs)[source]

Bases: marshmallow.schema.Schema

The BaseSerializer overrides Schema to include a internal to dump associated InternalObjects. These are instantiated with the serializer and used for loading and validating data. It also provides a mapping of numpy dtypes to a select amount of marshmallow field name which helps optimize memory in the to_dataframe object

get_numpy_fields()[source]

returns a dictionary of column names and numpy dtypes based on the ma_np_map dictionary. Collections will use this to create more mem-optimized dataframes

load_object(data, **kwargs)[source]

loads and validates an internal class object

numpy_map = {<class 'marshmallow.fields.Integer'>: dtype('int64'), <class 'marshmallow.fields.Float'>: dtype('float64'), <class 'marshmallow.fields.String'>: dtype('<U'), <class 'marshmallow.fields.Date'>: dtype('<M8[ns]'), <class 'marshmallow.fields.DateTime'>: dtype('<M8[ns]'), <class 'marshmallow.fields.List'>: dtype('O'), <class 'marshmallow.fields.Boolean'>: dtype('bool'), <class 'marshmallow.fields.Dict'>: dtype('O'), <class 'marshmallow.fields.Nested'>: dtype('O')}
opts = <marshmallow.schema.SchemaOpts object>
registered_colls = {}