Startup and Shutdown Guide

The translator and the engine are separate services and can be installed and run independently.

Depending on how you are intending to run the system startup and shutdown is slightly different.

For the impatient: Single node DALiuGE

As a developer the following two commands will start both the translator and the engine, including a data island manager (DIM) and a node manager (NM):

cd daliuge-translator ; ./run_translator dev ; cd ..
cd daliuge-engine ; ./run_engine dev ; cd ..

This is the quickest way to start deploying workflows. Obviously this is limited to a single computer, but certainly useful for testing out the system and developing new components. You can use EAGLE on the URL: https://eagle.icrar.org and point the EAGLE setting (keyboard shortcut ‘O’) for the translator URL to http://localhost:8084. After submitting your graph to the translator, the translator web interface will be opened in a new tab and you need to adjust the translator settings to point to the URL of the engine you’ve just started. Since the system is running across docker containers, you need to specify the IP address of the docker host machine, i.e. http://<IP-address>:8001 not localhost. These settings are saved in browser storage, that means that you will have the same settings when coming back. Now you have access to a complete DALiuGE system!

The following paragraphs are providing more detailed guidelines to enable people to start the system on multiple nodes to cover the specific local requirements.

Starting the docker containers

We are providing convenience scripts to start the docker containers built according to the Installation Guide. Depending whether you want to run the development (dev) or the deployment (dep) version of the image there exist different startup options. Starting the translator:

cd daliuge-translator
./run_translator.sh dev|dep

Similarly starting the engine:

cd daliuge-engine
./run_engine.sh dev|dep

The main difference between the development and the deployment version is that the development version is automatically strating a data island manager, while the deployment version is not doing that. Both are starting a Node Manager by default (see below). Using the shell scripts is not strictly necessary, but the docker command line is a bit complex.

Starting and stopping using CLI

If DALiuGE had been installed in a virtual environment of the host system it is possible to start the managers from the command line:

dlg dim -H 0.0.0.0 -N localhost -d

and a node manager:

dlg nm -H 0.0.0.0 -d

To stop the managers use:

dlg dim -s

and:

dlg nm -s

respectively. The help for the complete CLI is available by just entering dlg at the prompt:

❯ dlg
Usage: /home/awicenec/.pyenv/versions/dlg/bin/dlg [command] [options]

Commands are:
    daemon                   Starts a DALiuGE Daemon process
    dim                      Starts a Drop Island Manager
    fill                     Fill a Logical Graph with parameters
    include_dir              Print the directory where C header files can be found
    lgweb                    A Web server for the Logical Graph Editor
    map                      Maps a Physical Graph Template to resources and produces a Physical Graph
    mm                       Starts a Master Manager
    monitor                  A proxy to be used in conjunction with the dlg proxy in restricted environments
    nm                       Starts a Node Manager
    partition                Divides a Physical Graph Template into N logical partitions
    proxy                    A reverse proxy to be used in restricted environments to contact the Drop Managers
    replay                   Starts a Replay Manager
    submit                   Submits a Physical Graph to a Drop Manager
    unroll                   Unrolls a Logical Graph into a Physical Graph Template
    unroll-and-partition     unroll + partition
    version                  Reports the DALiuGE version and exits

Try $PATH/bin/dlg [command] --help for more details

More details about the usage of the CLI can be found in the CLI User Guide chapter.

Starting and stopping the managers

DALiuGE is using three different kinds of managers:

  1. Node Manager (NM), one per compute node participating in the DALiuGE cluster. The NMs are running all the component wrappers for a single node.

  2. Data Island Manager (DIM), which is manageing a (sub-)set of nodes in the cluster. There could be minimum one or maximum as many as NMs Data Island Managers in a deployment. The DIM is also the entity receiving the workflow description from the translator and is then distributing the sections to the NMs.

  3. Master Manager (MM), which has the information about all nodes and islands in the deployment. In many deployments the master manager is optional and not really required. If it is necessary, then there is only a single master manager running on the cluster.

Starting a master manager can be done using the dlg command:

dlg daemon

by default this will also start a NM, but not a DIM.

The managers are spawned off (as processes) from the daemon process, which also exposes a REST interface allowing the user to start and stop managers. The start and stop commands follow the URL pattern 1:

curl -X POST http://localhost:9000/managers/<type>/start

and:

curl -X POST http://localhost:9000/managers/<type>/stop

where <type> is on of [node|dataisland|master]. In case of the DIM (island) it is possible to specify the nodes participating in that specific island. For example:

curl -d '{"nodes": ["192.168.1.72","192.168.1.11"]}' -H "Content-Type: application/json" -X POST http://localhost:9000/managers/island/start

If a manager is already running or already stopped error messages are returned. In order to see which managers are running on a particular node you can use the GET method:

curl http://localhost:9000/managers

which returns something like:

{"master": null, "island": null, "node": 18}

In this example there is just a Node Manager running with process ID 18.

For the independent: Build and run EAGLE

It is also possible to start the EAGLE locally in addition as well. This requires to clone and build the EAGLE repo into a directory separate from the DALiuGE repo:

git clone https://github.com/ICRAR/EAGLE
cd EAGLE
./build_eagle dep

To start EAGLE:

./run_eagle dep

This will start the EAGLE docker image built in the previous step and try to open a browser tab.

(NOTE: The usage of the EAGLE visual graph editor is covered in its own documentation).

Zeroconf

The Master Manager also opens a zeroconf service, which allows the Node Managers to register and deregister and thus the MM is always up to date with the node available in the cluster. NOTE: This mechanism is currently not implemented for the DIMs, i.e. a DIM does not register with the MM automatically. Since it is not possible to guess which NM should belong to which DIM, the NMs also do not register with a DIM. For convenience and as an exception to this rule, when starting the development version of the daliuge-engine image, the single NM is automatically assigned to the DIM on localhost.

1

The daemon process is listening on port 9000 by default.