

Introduction¶

The idea behind Cloudgene is quite simple: If you are able to execute your program or Hadoop application on the command line, take some minutes and write a YAML configuration to connect your program with Cloudgene. Doing so, you are able to transform your Hadoop command line program (or a set of programs) into a web-based service, present your collaborators a scalable best practices workflow and provide reproducible science.

Project structure¶

First of all, we create a new project folder for our new application. This folder must contain a file called cloudgene.yaml. This file is also called the Manifest File and includes all information needed in order to install and execute your programs or pipelines. This folder contains also all other files that are needed by your workflow steps. For example binaries, a jar file of a ready-to-use MapReduce program, a PIG script or some R-Markdown script.

A basic Cloudgene application usually looks something like this:

my-cloudgene-app
├── cloudgene.yaml
├── sample-mapredduce-program.jar
├── sample-rmarkdown-report.Rmd
├── some-folder
|   ├── dataset2.csv
|   └── my-binary
└── README.md

After we are finished developing our application, we create a zip file of this folder. This zip file contains all needed files and can be deployed to any Cloudgene webserver by uploading it.

Read more about installing applications.

cloudgene.yaml¶

The file content starts with a simple header containing general information about the application, followed by the description of input/output parameters as well as all steps of the workflows:

A simple example looks like this:

id: tool-name
name: Tool Name
description: tool-description
category: tool-category
version: 1.0
website: http://www.my-website.com

The two most important fields are id, name and version, without them your application won’t be able to install. The id and version fields are used together to create a unique id. description and website should be used to describe your application.

The next step is to add the workflow section to your configuration file to define steps as well as input- and output-parameters.

Defining steps¶

The simplest way to model a workflow is to create a list of steps where each step depends on its forerunner. Steps are defined in the steps section where each step is defined by a name and type specific properties.

A simple example with two steps looks like this:

id: hello-cloudgene
name: Hello Cloudgene
version: 1.0
workflow:
  steps:
    - name: Step1
      cmd: /bin/echo hey cloudgene developer! I am step 1.
      stdout: true

    - name: Step2
      cmd: /bin/echo hey cloudgene developer! I am step 2.
      stdout: true

In this example we used the command line tool echo to print out some text. However, Cloudgene supports a variety of different step types which can be combined into one workflow to take advantage of different technologies.

To test our workflow we copy the content into a file named hello-cloudgene.yaml. Next, we can upload the zip file of our application to a Cloudgene webserver or we execute it on our developement machine with the cloudgene-cli program:

cloudgene run hello-cloudgene.yaml

Cloudgene 1.30.6
http://www.cloudgene.io
(c) 2009-2018 Lukas Forer and Sebastian Schoenherr
Built by lukas on 2018-11-19T10:14:17Z


hello-cloudgene 1.0

[INFO]  No external Haddop cluster set.
[WARN]  Cluster seems unreachable. Hadoop support disabled.
[INFO]  Submit job job-20181119-112052...
[OUT]   hey cloudgene developer! I am step 1.
[OK]    Execution successful.
[OUT]   hey cloudgene developer! I am step 2.
[OK]    Execution successful.

Done! Executed without errors.
Results can be found in file:///home/lukas/new-cloudgene/job-20181119-112052

We see that Cloudgene executes our workflow and prints the text to the terminal.

Defining input parameters¶

Input parameters are defined in the inputs section where each parameter is defined by an unique id, a textual description and a type.

We extend the example above by an input parameter to set the message:

id: hello-cloudgene
name: Hello Cloudgene
version: 1.0
workflow:
  steps:
    - name: Step1
      cmd: /bin/echo hey cloudgene developer! $message
      stdout: true
  inputs:
    - id: message
      description: Message
      type: text

If the workflow is executed on a Cloudgene Webserver, a web-interface is automatically created where the user has to enter the message. However, if we execute the workflow using cloudgene-cli, the parameter has to be set as a command-line argument:

cloudgene run hello-cloudgene.yaml --message "Using an input parameter is easy!"

Cloudgene 1.30.6
http://www.cloudgene.io
(c) 2009-2018 Lukas Forer and Sebastian Schoenherr
Built by lukas on 2018-11-19T10:14:17Z


hello-cloudgene 1.0

[INFO]  No external Haddop cluster set.
[WARN]  Cluster seems unreachable. Hadoop support disabled.
[INFO]  Submit job job-20181119-112527...
  Input values:
    message: Using an input parameter is easy!
[OUT]   hey cloudgene developer! Using an input parameter is easy!
[OK]    Execution successful.

Done! Executed without errors.
Results can be found in file:///home/lukas/new-cloudgene/job-20181119-112527

Serve the webservice¶

Before you can start a webservice for your workflow, you have to install it:

cloudgene install hello-cloudgene.yaml

Cloudgene 1.30.6
http://www.cloudgene.io
(c) 2009-2018 Lukas Forer and Sebastian Schoenherr
Built by lukas on 2018-11-19T10:14:17Z

Installing application hello-cloudgene...
Process file hello-cloudgene.yaml....
[OK] 1 Applications installed:

APPLICATION                        VERSION             STATUS              FILENAME
hello-cloudgene                    1.0                 OK                  hello-cloudgene.yaml

Next, we can start the webserver:

cloudgene server

The webservice is available on http://localhost:8082. Please use username admin and password admin1978 to login.

Click on Run and slect your application (hello-cloudgene). Cloudgene creates automatically a web-interface for your input parameters:

Enter a message and click on Submit Job to run your workflow. You should see the following job output:

You can install different workflows in the same instance and define in the Admin Dashboard who has access to each of them.