System Requirements¶

The run the Knowledge Network Build Pipeline with the KN_Builder, you must have three tools installed:

make
Docker
Docker Compose: https://docs.docker.com/compose/install/#install-compose

Your system must also meet the following requirements:

Minimum of 4 CPUs, 16GB RAM, 2TB disk
Must not have Mesos/Zookeeper/Chronos/Marathon or anything running at their default ports (5050/2181/8888/8080)

Quick Start¶

First, check out the quick run repo:

git clone https://github.com/KnowEnG/KnowNet_Pipeline_Tools
cd KnowNet_Pipeline_Tools

Note

Depending on your setup, some of the following commands may require root. This is because docker by default does not allow non-root processes to start jobs. In addition, the jobs are run as root inside docker, so all the output and intermediate files will be created as root.

Then, running the pipeline is as simple as running make:

make knownet

This will start up our Mesos environment and then run the build pipeline for all officially supported species and sources.

Overview of Build Pipeline¶

The make command will produce a large amount of output. First, it will show the status of starting up Mesos and Chronos and then show starting up the databases. After it finishes that phase, it will start the build pipeline and periodically print the status of the pipeline. It should return when either an error occurs or the pipeline finishes running.

The build pipeline consists of several stages:

SETUP: Downloads and imports Ensembl and sets up gene mapping information.
CHECK: Downloads and processes the rest of the sources. This consists of several substeps.
1. fetch: Downloads the source data files.
2. table: Takes the source files and reformats it into our table file format.
3. map: Maps the identifiers in the source to our internal identifiers.
IMPORT: Imports all of the files into mysql and redis databases.
EXPORT: Exports the Knowledge Network into flatfiles and dumps the mysql and redis databases.

Output Files¶

Running the pipeline will create several directories:

Directory	Contents
kn-final	Stores the final processed output files.
kn-logs	Stores the log files.
kn-rawdata	Stores the downloaded and processed data.
kn-mysql	Stores the MySQL database.
kn-redis	Stores the redis database.

Information about the output and intermediate file and database formats can be found here.

Clean Up After Build¶

To clean up the files (except kn-logs and kn-final), as well as Chronos, Marathon, and Mesos, run:

make clean
make destroy

Primary Parameters¶

To build the Knowledge Network for only a subset of species or sources, you can specify them as ,,-separated variables, like so:

make knownet SPECIES=homo_sapiens,,mus_musculus SOURCES=kegg,,stringdb

The names of the SPECIES should be all lowercase and spaces replaced by underscores.

The possible SOURCES names can be found here: SrcClasses

Additional Resources¶

Summary of Current Knowledge Network Contents.
Details of Current Knowledge Network Contents.
List of Related Knowledge Network Tools.

Basic Troubleshooting¶

If you run into errors when building the Knowledge Network, you can look at the status of all remaining jobs on Chronos

curl -L -s -X GET 127.0.0.1:8888/scheduler/graph/csv | grep node, | \
  awk -F, '{print $3"\t"$4"\t"$1"\t"$2}' | sort | uniq | grep -v succ

For any failed job (e.g. JOBNAME), you can look to

the original Chronos command at: kn-logs/chronos_jobs/JOBNAME.json or
the captured output log at: kn-logs/JOBNAME.json.

These may provide you with a reason that the job is failing. If the original source has changed their format, you may rerun using the SOURCES parameter, specifying all sources except the problematic ones.

System Requirements¶

Quick Start¶

Overview of Build Pipeline¶

Output Files¶

Clean Up After Build¶

Primary Parameters¶

Additional Resources¶

Basic Troubleshooting¶

Table Of Contents

Related Topics

This Page