A web app for environmental DNA metabarcoding analysis
SLIM is a node.js web app providing an easy Graphical User Interface (GUI) to wrap bioinformatics tools for amplicon sequencing analysis (from illumina paired-end FASTQ to annotated OTU matrix). All the pipeline is embedded in a docker.
See below for full instructions
The execution of the start_slim_v1.1.sh
script deploys and start the webserver.
By default, the webserver is accessible on the 8080 port.
To access it on a remote server from your machine, type the server IP address followed by “:8080” (for example 156.241.0.12:8080
) from an internet browser (prefer Firefox and Google Chrome)
If SLIM is deployed on your own machine, type localhost:8080/
If the server is correctly set, you should see this:
The “file uploader” section allows you to upload all the required files. Usually it consists of:
Example of tag-to-sample file: This file must contain at least the four four fields: run, sample, forward and reverse. “Run” corresponds to your illumina library identification; “sample” corresponds to the names of your samples in the library; “forward” and “reverse” corresponds to the names of your tagged primers. Samples names MUST be unique, even for replicates sequenced in multiples libraries
run,sample,forward,reverse
library_1,sample_1,forwardPrimer-A,reversePrimer-B
library_1,sample_2,forwardPrimer-B,reversePrimer-C
library_2,sample_3,forwardPrimer-A,reversePrimer-B
library_2,sample_4,forwardPrimer-B,reversePrimer-C
Example of primers FASTA file: It contains the names of your tagged primers and their sequences, in a conventional FASTA format. Each primer tag consists of 4 variables nucleotides at the 5’ side, prior the template specific part. Each primer must contains a specific identifier (by letters in this example). The primers sequences can include IUPAC nucleotide codes, they are taken into account.
>forwardPrimer-A
ACCTGCCTAGCGTYG
>forwardPrimer-B
GAATGCCTAGCGTYG
>reversePrimer-B
GAATCTYCAAATCGG
>reversePrimer-C
ACTACTYCAAATCGG
Example of sequences reference database file
This FASTA file contains reference sequences with unique identifier and taxonomic path in the header.
Such database can be downloaded for instance from SILVA for both prokaryotes and eukaryotes (16S and 18S), EUKREF for eularyotes (18S), UNITE for fungi (ITS), MIDORI for metazoan (COI).
Each header include a unique identifier (usually the accession),
a space ‘ ‘, and the taxonomic path separated by a semi-colon (without any space, please use “_” underscore).
Prefer having the same amount of taxonomic rank for each reference sequences
>AB353770 Eukaryota;Alveolata;Dinophyta;Dinophyceae;Dinophyceae_X;Dinophyceae_XX;Peridiniopsis;Peridiniopsis_kevei
ATGCTTGTCTCAAAGATTAAGCCATGCATGTCTCAGTATAAGCTTTTACATGGCGAAACTGCGAATGGCTCATTAAAACAGTTACAGTTTATTTGAA
GGTCATTTTCTACATGGATAACTGTGGTAATTCTAGAGCTAATACATGCGCCCAAACCCGACTCCGTGGAAGGGTTGTATTTATTAGTTACAGAACC
AACCCAGGTTCGCCTGGCCATTTGGTGATTCATAATAAACGAGCGAATTGCACAGCCTCAGCTGGCGATGTATCATTCAAGTTTCTGACCTATCAGC
TTCCGACGGTAGGGTATTGGCCTACCGTGGCAATGACGGGTAACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCCTGA
>KC672520 Eukaryota;Opisthokonta;Fungi;Ascomycota;Pezizomycotina;Leotiomycetes;Leotiomycetes_X;Leotiomycetes_X_sp.
TACCTGGTTGATTCTGCCCCTATTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAGCAATATATACCGTGAAACTGCGAATGGC
TCATTATATCAGTTATAGTTTATTTGATAGTACCTTACTACT
>AB284159 Eukaryota;Alveolata;Dinophyta;Dinophyceae;Dinophyceae_X;Dinophyceae_XX;Protoperidinium;Protoperidinium_bipes
TGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTCAGTATAAGCTTCAACATGGCAAGACTGTGAATGGCTCATTAAAA
CAGTTGTAGTTTATTTGGTGGCCTCTTTACATGGATAGCCGTGGTAATTCTAGAACTAATACATGCGCTCAAGCCCGACTTCGCAGAAGGGCTGTGT
TTATTTGTTACAGAACCATTTCAGGCTCTGCCTGGTTTTTGGTGAATCAAAATACCTTATGGATTGTGTGGCATCAGCTGGTGATGACTCATTCAAG
CTT
Usually, a typical workflow would include:
The “Add a new module” section has a drop-down list containing various modules to pick, set and chain. Pick one and hit the “+” button. This will add the module at the bottom of the first section, and prompting you to fill the required fields. For more informations on the modules, you can refer to their manuals on the wiki or by clicking the (i) button on the module interface.
The use of wildcard ‘*’ for file pointing
The chaining between module is made through the files names used as input / output. To avoid having to select mannually all the samples to be included in an analysis, wildcards ‘*’ (meaning ‘all’) are generated and used by the application. Such wildcards are generated from the compressed libraries fastq files (tar.gz) and by the tag-to-sample file. Users cannot type on their own wildcards in the file names. Instead, the application has an autocompletion feature and will make wildcards suggestions for the user to select within the GUI.
To point to a set of samples (all samples from the tag-to-sample, or all the samples from the library_1 for instance), there will be a ‘*’, and the application adds the processing step as a suffix incrementaly:
The same principle applies for OTU matrices, we add the previous processing step as a suffix in the file name.
see below for the demultiplexing
and below for the OTU clustering and taxonomic assignement
Once your workflow is set, please fill the email field and click on the start button. Your job will automatically be scheduled on the server. You will receive an email when your job starts, if you job aborted and when your job is over. This email contains a direct link to your job so you can close the internet browser tab once you started the execution.
When your job is over, you will have small icons of download on the right of each output field. All the uploaded, intermediate and results files are available to download. Your files will remain available on the server during 24h, after what they will be removed for disk usage optimisation
For more details on the app, you can refer to the wiki pages
First of all, docker needs to be installed on the machine. You can find instructions here :
To install SLIM, get the last stable release here or, using terminal :
sudo apt-get update && apt-get install git curl
curl -OL https://github.com/trtcrd/SLIM/archive/v1.1.tar.gz
tar -xzvf v1.1.tar.gz
cd SLIM-1.1
Before deploying SLIM, you need to configure the mailing account that will be used for mailing service. We advise to use gmail, as it is already set in the ‘server/config.js’ file. This file need to be updated with your ‘user’ and ‘pass’ credentials on the server:
exports.mailer = {
host: 'smtp.gmail.com',
port: 465,
secure: true, // true for 465, false for other ports
auth: {
user: 'username',
pass: 'password'
}
}
As soon as docker is installed and running, the SLIM archive downloaded and the mailing account set, it can be deployed by using the two scripts get_dependencies_slim_v1.1.sh
and start_slim_v1.1.sh
as super user.
get_dependencies_slim_v1.1.sh
fetches all the bioinformatics tools needed from their respective repositories.start_slim_v1.1.sh
destroys the current running webserver to replace it with a new one.
/!\ All the files previously uploaded and the results of analysis will be detroyed during the process.sudo bash get_dependencies_slim_v1.1.sh
sudo bash start_slim_v1.1.sh
To contribute by adding new softwares, you will have to know a little bit of HTML and javascript. Please refer to the wiki pages to learn how to create a module.
v1.1
Updated the get_dependencies
script.
v1.0
First stable release, with third-parties versions handled within the get_dependencies_slim.sh
script.