binx package¶
Submodules¶
binx.adapter module¶
The adapter module helps the user mutate one collection into another. This works similar to a calc class in calc_factory. AbstractAdapter’s call takes a collection’s data attribute as the first argument. An adapt() method must be overridden by the user which must return an instance of AdapterOutputContainer. This is enforced by returning the helper method
Once the class is declared the user register’s the adapter using the register static method
-
class
binx.adapter.
AbstractAdapter
[source]¶ Bases:
abc.ABC
Concrete Adapters subclass this class and override its adapt method with an implementation. Other methods may be added as helper classes
-
adapt
(collection, **context)[source]¶ The user must overrides this method to do the data cleaning. Any additional methods needed for clean should be considered private to the Adapter subclass. Must call render_return
-
from_collection_class
= None¶
-
is_registered
= False¶
-
render_return
(data, **context)[source]¶ A helper method renders the data response to the target_collection_class and passes context data. Must return the data in an instance of AdapterOutputContainer
-
target_collection_class
= None¶
-
-
class
binx.adapter.
AdapterOutputContainer
(collection, **context)[source]¶ Bases:
object
A generic container class for moving data out of adapters. It holds the target output collection along with any context data that might need to be passed on to the caller or another adapter. Essentially ‘side effects’ from the adaptation that might be needed further along in the adapter chain.
NOTE that the context in a container instance only relates to its immediate adapter call. It does contain any of the surrounding context. This gets accumulated in BaseCollection._resolve_adapter_chain
This is used internally in Adapter.__call__
-
context
¶
-
-
class
binx.adapter.
PluggableAdapter
[source]¶ Bases:
binx.adapter.AbstractAdapter
creates a pluggable interface for Adapters. A user should subclass this class and provide a calc object that
-
calc
= None¶
-
binx.collection module¶
Abstract base classes for the system. The AHUOb
-
class
binx.collection.
AbstractCollection
[source]¶ Bases:
object
Defines an interface for Collection objects. This includes a valid marshmallow serializer class, a data list object iterablem, load_data method with validation. Collections are also registered so this AbstractCollection uses AbstractCollectionMeta as a metaclass
-
data
¶ returns an object-representation of the metadata using the serializer
-
get_fully_qualified_class_path
()[source]¶ reaches into the registry and gets the fully qualified class path
-
internal_class
¶ returns an ma serializer. Used for validation and instantiation NOTE possibly change to class method
-
load_data
(object)[source]¶ uses a marshmallow serializer to validate and load the data into an object-record representation
-
serializer_class
¶ returns an ma serializer. Used for validation and instantiation
-
-
class
binx.collection.
AbstractCollectionBuilder
[source]¶ Bases:
abc.ABC
An interface for the CollectionBuilder. A build method takes a subclass of BaseSerializer and creates a Collection class dynamically. Its use is optional but is designed to cut down on class declarations if the user is making many generic Collection implementations.
-
class
binx.collection.
BaseCollection
(data=None, **ma_kwargs)[source]¶ Bases:
binx.collection.AbstractCollection
Used to implement many of the default AbstractCollection methods Subclasses will mostly just need to define a custom Serializer and InternalObject pair
Parameters: data – the data being passed into the serializer, could be a dataframe or list of records. If None -
classmethod
adapt
(input_collection, accumulate=False, **adapter_context)[source]¶ Attempts to adapt the input collection instance into a collection of this type by resolving the adapter chain for the input collection. Any kwargs passed in are handed over to the resolver. colla = CollectionA() colla.load_data(some_data) collb, context = CollectionB.adapt(colla, some_var=42, some_other_var=66)
This method returns a new instance of the adapted class (the caller)
-
collection_id
¶
-
data
¶ returns an object-representation of the metadata using the serializer
-
classmethod
get_fully_qualified_class_path
()[source]¶ This returns the fully qualified class name for this class. This can be used for collection_registry lookup
-
internal
¶ returns a class of the internal object
-
internal_class
¶ alias of
InternalObject
-
load_data
(records, raise_on_empty=False)[source]¶ default implementation. Defaults to handling lists of python-dicts (records). #TODO – create a drop_duplicates option and use pandas to drop the dupes
-
serializer
¶ returns an ma serializer. Used for validation and instantiation
-
serializer_class
¶ alias of
BaseSerializer
-
classmethod
-
class
binx.collection.
BaseSerializer
(*args, **kwargs)[source]¶ Bases:
marshmallow.schema.Schema
The BaseSerializer overrides Schema to include a internal to dump associated InternalObjects. These are instantiated with the serializer and used for loading and validating data. It also provides a mapping of numpy dtypes to a select amount of marshmallow field name which helps optimize memory in the to_dataframe object
-
get_numpy_fields
()[source]¶ returns a dictionary of column names and numpy dtypes based on the ma_np_map dictionary. Collections will use this to create more mem-optimized dataframes
-
numpy_map
= {<class 'marshmallow.fields.Integer'>: dtype('int64'), <class 'marshmallow.fields.Float'>: dtype('float64'), <class 'marshmallow.fields.String'>: dtype('<U'), <class 'marshmallow.fields.Date'>: dtype('<M8[ns]'), <class 'marshmallow.fields.DateTime'>: dtype('<M8[ns]'), <class 'marshmallow.fields.List'>: dtype('O'), <class 'marshmallow.fields.Boolean'>: dtype('bool'), <class 'marshmallow.fields.Dict'>: dtype('O'), <class 'marshmallow.fields.Nested'>: dtype('O')}¶
-
opts
= <marshmallow.schema.SchemaOpts object>¶
-
registered_colls
= {}¶
-
-
class
binx.collection.
CollectionBuilder
(name=None, unique_fields=None)[source]¶ Bases:
binx.collection.AbstractCollectionBuilder
A factory class that contructs Collection objects dynamically, providing a default namespace for binx.registry and the adapter chain.
-
build
(serializer_class, name=None, internal_only=False)[source]¶ dynamically creates and returns a Collection class given a serializer and identifier. If internal_only is set to True then this will only return the internal. This is useful if you are using a declarative approach to defining the collections and want to add or override some of the base behavior
-
-
class
binx.collection.
InternalObject
(*args, **kwargs)[source]¶ Bases:
object
a namespace class for instance checking for an internally used model object It is otherwise a normal python object. _Internals are used as medium for serialization and deserialization and their declarations bound with Collections and enforced by Serializers. It can be inherited from or used as a Mixin.
-
is_binx_internal
= True¶
-
registered_colls
= {}¶
-
binx.exceptions module¶
Custom exceptions for binx
-
exception
binx.exceptions.
AdapterChainError
[source]¶ Bases:
binx.exceptions.BinxError
thrown if a input collection cannot be found on the adapter chain for a Collection
-
exception
binx.exceptions.
AdapterCollectionResultError
[source]¶ Bases:
binx.exceptions.BinxError
thrown if a collection load fails while attempting to adapt
-
exception
binx.exceptions.
AdapterFunctionError
[source]¶ Bases:
binx.exceptions.BinxError
,ValueError
thrown if a 2-tuple is not returned from a pluggable adapter function.
-
exception
binx.exceptions.
CollectionLoadError
[source]¶ Bases:
binx.exceptions.BinxError
thrown if a Collection fails to load its Internal Object Collection this could be due to a validation error or some other issue
-
exception
binx.exceptions.
CollectionValidationError
(message: Union[str, List[T], Dict[KT, VT]], field_name: str = '_schema', data: Mapping[str, Any] = None, valid_data: Union[List[Dict[str, Any]], Dict[str, Any]] = None, **kwargs)[source]¶ Bases:
marshmallow.exceptions.ValidationError
,binx.exceptions.BinxError
subclass of a marshmallow validation error
-
exception
binx.exceptions.
FactoryCreateValidationError
[source]¶ Bases:
binx.exceptions.BinxError
wraps a marshmallow validation error in the create method of the factory
-
exception
binx.exceptions.
FactoryProcessorFailureError
[source]¶ Bases:
binx.exceptions.BinxError
raised if the _process method of a Factory fails to produce any results
-
exception
binx.exceptions.
InternalNotDefinedError
[source]¶ Bases:
binx.exceptions.BinxError
used for development - thrown if an Internal class is improperly declared on a Collection
-
exception
binx.exceptions.
RegistryError
[source]¶ Bases:
binx.exceptions.BinxError
,KeyError
raised if a classname already exists in the collection registry
binx.registry module¶
A private registry for the collection objects. It is mainly used to register adaption classes on each collection object for data cleaning/processing. The classes are created by the user and registered at runtime.
-
binx.registry.
adapter_path
(from_class, end_class)[source]¶ traverses the registry and builds a class path of adapters to a target using by looking at each nodes ‘adaptable_from’ set. It will traverse the graph until all possibilities are exhausted. If it finds a matching adaptable, it returns the path of adapter objects that are needed to adapt the schema. If no path is found it returns an empty list
-
binx.registry.
get_class_from_collection_registry
(classname)[source]¶ returns the full tuple given the fully qualified classname
-
binx.registry.
register_adaptable_collection
(classname, coll)[source]¶ appends an adaptable collection to a classes list of adaptable collections
binx.utils module¶
general purpose functionality. Classes are loosely classified by method type
-
class
binx.utils.
DataFrameDtypeConversion
[source]¶ Bases:
object
Module contents¶
Top-level package for binx.
-
class
binx.
BaseCollection
(data=None, **ma_kwargs)[source]¶ Bases:
binx.collection.AbstractCollection
Used to implement many of the default AbstractCollection methods Subclasses will mostly just need to define a custom Serializer and InternalObject pair
Parameters: data – the data being passed into the serializer, could be a dataframe or list of records. If None -
classmethod
adapt
(input_collection, accumulate=False, **adapter_context)[source]¶ Attempts to adapt the input collection instance into a collection of this type by resolving the adapter chain for the input collection. Any kwargs passed in are handed over to the resolver. colla = CollectionA() colla.load_data(some_data) collb, context = CollectionB.adapt(colla, some_var=42, some_other_var=66)
This method returns a new instance of the adapted class (the caller)
-
collection_id
¶
-
data
¶ returns an object-representation of the metadata using the serializer
-
classmethod
get_fully_qualified_class_path
()[source]¶ This returns the fully qualified class name for this class. This can be used for collection_registry lookup
-
internal
¶ returns a class of the internal object
-
internal_class
¶ alias of
InternalObject
-
load_data
(records, raise_on_empty=False)[source]¶ default implementation. Defaults to handling lists of python-dicts (records). #TODO – create a drop_duplicates option and use pandas to drop the dupes
-
serializer
¶ returns an ma serializer. Used for validation and instantiation
-
serializer_class
¶ alias of
BaseSerializer
-
classmethod
-
class
binx.
InternalObject
(*args, **kwargs)[source]¶ Bases:
object
a namespace class for instance checking for an internally used model object It is otherwise a normal python object. _Internals are used as medium for serialization and deserialization and their declarations bound with Collections and enforced by Serializers. It can be inherited from or used as a Mixin.
-
is_binx_internal
= True¶
-
registered_colls
= {}¶
-
-
class
binx.
BaseSerializer
(*args, **kwargs)[source]¶ Bases:
marshmallow.schema.Schema
The BaseSerializer overrides Schema to include a internal to dump associated InternalObjects. These are instantiated with the serializer and used for loading and validating data. It also provides a mapping of numpy dtypes to a select amount of marshmallow field name which helps optimize memory in the to_dataframe object
-
get_numpy_fields
()[source]¶ returns a dictionary of column names and numpy dtypes based on the ma_np_map dictionary. Collections will use this to create more mem-optimized dataframes
-
numpy_map
= {<class 'marshmallow.fields.Integer'>: dtype('int64'), <class 'marshmallow.fields.Float'>: dtype('float64'), <class 'marshmallow.fields.String'>: dtype('<U'), <class 'marshmallow.fields.Date'>: dtype('<M8[ns]'), <class 'marshmallow.fields.DateTime'>: dtype('<M8[ns]'), <class 'marshmallow.fields.List'>: dtype('O'), <class 'marshmallow.fields.Boolean'>: dtype('bool'), <class 'marshmallow.fields.Dict'>: dtype('O'), <class 'marshmallow.fields.Nested'>: dtype('O')}¶
-
opts
= <marshmallow.schema.SchemaOpts object>¶
-
registered_colls
= {}¶
-