DROPs

DROPs are at the center of the DALiuGE. DROPs are representations of data and applications, making them manageable by DALiuGE.

Lifecycle

The lifecycle of a DROP is simple and follows the basic principle of writing once, read many times. Additionally, it also allows for data deletion.

A DROP starts in the INITIALIZED state, meaning that its data is not present yet. From there it jumps into COMPLETED once its data has been written, optionally passing through WRITING if the writing occurs through DALiuGE (see Input/Output). Once in the COMPLETED state the data can be read as many times as needed. Eventually, the DROP will transition to EXPIRED, denying any further reads. Finally the data is deleted and the DROP moves to the final DELETED state. If any I/O error occurs the DROP will be moved to the ERROR state.

Events

Changes in a DROP state, and other actions performed on a DROP, will fire named events which are sent to all the interested subscribers. Users can subscribe to particular named events, or to all events.

In particular the Node DROP Manager subscribes to all events generated by the DROPs it manages. By doing so it can monitor all their activities and perform any appropriate action as required. The Node DROP Manager, or any other entity, can thus become a Graph Event Manager, in the sense that they can subscribe to all events sent by all DROPs and make use of them.

Relationships

DROPs are connected between them and create a graph representing an execution plan, where inputs and outputs are connected to applications, establishing the following possible relationships:

  • None or many data DROP(s) can be the input of an application DROP; and the application is the consumer of the data DROP(s).
  • A data DROP can be a streaming input of an application DROP in which case the application is seen as a streaming consumer from the data DROP’s point of view.
  • None or many DROP(s) can be the output of an application DROP, in which case the application is the producer of the data DROP(s).
  • An application is never a consumer or producer of another application; conversely a data DROP never produces or consumes another data DROP.

The difference between normal inputs/consumers and their streaming counterpart is their granularity. In the normal case, inputs only notify their consumers when they have reached the COMPLETED state, after which the consumers can open the DROP and read their data. Streaming inputs on the other hand notify consumers each time data is written into them (alongside with the data itself), and thus allow for a continuous operation of applications as data gets written into their inputs. Once all the data has been written, the normal event notifying that the DROP has moved to the COMPLETED state is also fired.

Input/Output

I/O can be performed on the data that is represented by a DROP by obtaining a reference to its I/O object and calling the necessary POSIX-like methods. In this case, the data is passing through the DROP instance. The application is free to bypass the DROP interface and perform I/O directly on the data, in which case it uses the data DROP dataURL to find out the data location. It is the responsibility of the application to ensure that the I/O is occurring in the correct location and using the expected format for storage or subsequent upstream processing by other application DROPs.

DALiuGE provides various commonly used data DROPs with their associated I/O storage classes, including in-memory, file-base and S3 storages.

DROP Channels

DROPs that are connected by an edge in a physical graph but are deployed on separate nodes or islands from each other are automatically given a Pyro stub (remote method invocation interface) to allow them to communicate with each other. It’s the job of the Master DROP and Island Managers to generate and exchange stubs between DROP instances before the graph is deployed to the various data islands and nodes within islands respectively. If there is no DROP separation within a physical graph partition then its implied that the DROPs are going to be executed within a single address space, as a result, basic method calls are used between DROP instances.

DROP Component Interface

The DALiuGE framework uses Docker containers as its primary interface to 3rd party applications. Docker containers have the following benefits over traditional tools management:

  • Portability.
  • Versioning and component reuse.
  • Lightweight footprint.
  • Simple maintenance.

The application programmer can make use of the DockerApp which is the interface between a Docker container and the DROP framework. Refer to the documentation for details.

Other applications not based on Docker containers can be written as well. Any application must derive at least from AppDROP, but an easier-to-use base class is the BarrierAppDROP, which simply requires a run method to be written by the developer (see dlg.drop for details). DALiuGE ships with a set of pre-existing applications to perform common operations, like a TCP socket listener and a bash command executor, among others. See dlg.apps for more examples.