Module to store persistence handler classes.
Persistence handlers take care of all implementation details related to resource storage. They expose a common interface (defined in BasePersistenceHandler) through which the server (and/or filters/crawlers) can load, save and perform other operations over resources independently from where and how the resources are actually stored. At any point in time, the collection status of each resource must be one of those defined in the struct-like class StatusCodes.
A struct-like class to hold constants for resources status codes.
The numeric value of each code can be modified to match the one used in the final location where the resources are persisted. The name of each code (SUCCEEDED, INPROGRESS, AVAILABLE, FAILED, ERROR) must not be modified.
Abstract class. All persistence handlers should inherit from it or from other class that inherits.
Constructor.
Each persistence handler receives everything in its corresponding handler section of the XML configuration file as the parameter configurationsDictionary.
Extract and store configurations.
If some configuration needs any kind of pre-processing, it is done here. Extend this method if you need to pre-process custom configuration options.
Execute per client initialization procedures.
This method is called every time a connection to a new client is opened, allowing to execute initialization code on a per client basis (which differs from __init__() that is called when the server instantiate the persistence handler, i.e., __init__() is called just one time for the whole period of execution of the program).
Retrive an AVAILABLE resource.
A tuple in the format (resourceKey, resourceID, resourceInfo).
Update the specified resource, setting its status and information data to the ones given.
Insert new resources into the final location where resources are persisted.
Count the number of resources in each status category.
Change to AVAILABLE all resources with the status code given.
Execute per client finalization procedures.
This method is called every time a connection to a client is closed, allowing to execute finalization code on a per client basis. It is the counterpart of setup().
Execute program finalization procedures (similar to a destructor).
This method is called when the server is shut down, allowing to execute finalization code in a global manner. It is intended to be the counterpart of __init__(), but differs from __del__() in that it is not bounded to the live of the persistence handler object itself, but rather to the span of execution time of the server.
Bases: persistence.MemoryPersistenceHandler
Load and dump resources from/to a file.
All resources in the file are loaded into memory before the server operations begin. So, this handler is recomended for small to medium size datasets that can be completely fitted into machine’s memory. For larger datasets, consider using another persistence handler. Another option for large datasets is to divide the resources in more than one file, collecting the resources of one file at a time.
The default version of this handler supports CSV and JSON files. It is possible to add support to other file types by subclassing BaseFileColumns and BaseFileHandler. The new file type must also be included in the supportedFileTypes dictionary.
Hold column names of data in the file, allowing fast access to names of ID, status and info columns.
Extract column names from the file.
Must be overriden, as column names extraction depends on the file type.
Handle low level details about persistence in a specific file type.
Each resource loaded from a file is stored in memory in a dictionary in the format {"id": X, "status": X, "info": {...}}, which is the resource internal representation format. This handler is responsible for translating resources in the internal representation format to the format used in a specific file type and vice-versa.
Transform resource from file format to internal representation format.
Transform resource from internal representation format to file format.
Load resources in file format and yield them in internal representation format.
Save resources in internal representation format to file format.
Bases: persistence.BaseFileColumns
Hold column names of data in CSV files, allowing fast access to names of ID, status and info columns.
Bases: persistence.BaseFileHandler
Handle low level details about persistence in CSV files.
Note
This class and CSVColumns class uses Python’s built-in csv module internally.
Bases: persistence.BaseFileColumns
Hold column names of data in JSON files, allowing fast access to names of ID, status and info columns.
Bases: persistence.BaseFileHandler
Handle low level details about persistence in JSON files.
Note
This class and JSONColumns uses Python’s built-in json module internally.
Associate file types and its columns and handler classes. The type of the current file is provided by the user directly (through the filetype option in the XML configuration file) or indirectly (through the file extension extracted from file name). When checking if the type of the current file is on the list of supported file types, the comparison between the strings is case insensitive.
Bases: persistence.FilePersistenceHandler
Load and dump resources from/to files respecting limits of file size and/or number of resources per file.
This handler uses multiple instances of FilePersistenceHandler to allow insertion of new resources respecting limits specified by the user. It is also capable of reading and updating resources from multiple files.
The rollover handler leaves the low level details of persistence for the file handlers attached to each file, taking care of the coordination necessary to maintain consistency between them and also of the verification of limits established.
When inserting new resources, every time the file size limit and/or number of resources per file limit is reached rollover handler opens a new file and assigns a new instance of FilePersistenceHandler to handle it. All resources, however, are maintained in memory. So, as in the case of FilePersistenceHandler, this handler is not well suited for large datasets that cannot be completely fitted in memory.
Note
This handler was inspired by Python’s logging.handlers.RotatingFileHandler class.
Bases: persistence.BasePersistenceHandler
Store and retrieve resources to/from a MySQL database.
The table must already exist in the database and must contain at least three columns: a primary key column, a resource ID column and a status column.
Note
This handler uses MySQL Connector/Python to interact with MySQL databases.