Requirements
PodiumASM is developed to work on an HPC distributed cluster.
Install PodiumASM PyPI package
First, install the PodiumASM python package with pip.
git clone https://github.com/thdurand4/PodiumASM.git
cd PodiumASM
python3 -m pip install -e .
podiumASM --help
Now, follow this documentation according to what you want, local or HPC mode.
Steps for LOCAL installation
Install PodiumASM in a local (single machine) mode using podiumASM install_local
command line.
install_local
Run installation for running in local computer.
The process downloading singularity images automatically.
podiumASM install_local [OPTIONS]
Options
- --bash_completion, --no-bash_completion
Allow bash completion of podiumASM commands on the bashrc file
- Default:
True
Steps for HPC distributed cluster installation
PodiumASM uses any available snakemake profiles to ease cluster installation and resources management. Run the command podiumASM install_cluster to install on a HPC cluster. We tried to make cluster installation as easy as possible, but it is somehow necessary to adapt a few files according to your cluster environment.
install_cluster
Run installation of tool for HPC cluster
podiumASM install_cluster [OPTIONS]
Options
- -s, --scheduler <scheduler>
Type the HPC scheduler (for the moment, only slurm is available ! )
- Default:
slurm
- Options:
slurm
- -e, --env <env>
Mode for tools dependencies
- Default:
modules
- Options:
modules | singularity
- --bash_completion, --no-bash_completion
Allow bash completion of podiumASM commands on the bashrc file
- Default:
True
1. Adapt profile and cluster_config.yaml
f Now that PodiumASM is installed, it proposes default configuration files, but they can be modified. Please check and adapt these files to your own system architecture.
1. Adapt the pre-formatted f –env si`snakemake profile` to configure your cluster options. See the section 1. Snakemake profiles for details.
2. Adapt the cluster_config.yaml
file to manage cluster resources such as partition, memory and threads available for each job.
See the section 2. Adapting cluster_config.yaml for further details.
2. Adapt tools_path.yaml
As PodiumASM uses many tools, you must install them using env modules possibilities :
Using the
module load
mode,
podiumASM install_cluster --help
podiumASM install_cluster --scheduler slurm --env modules
Adapt the file :file:tools_path.yaml
- in YAML (Yet Another Markup Language) - format to indicate podiumASM where the different tools are installed on your cluster.
See the section 3. How to configure tools_path.yaml for details.
Advance installation
1. Snakemake profiles
The Snakemake-profiles project is an open effort to create configuration profiles allowing to execute Snakemake in various computing environments (job scheduling systems as Slurm, SGE, Grid middleware, or cloud computing), and available at https://github.com/Snakemake-Profiles/doc.
In order to run PodiumASM on HPC cluster, we take advantages of profiles.
Quickly, see here an example of the Snakemake SLURM profile we used for the French national bioinformatics infrastructure at IFB.
More info about profiles can be found here https://github.com/Snakemake-Profiles/slurm#quickstart.
Preparing the profile’s config.yaml file
Once your basic profile is created, to finalize it, modify as necessary the podiumASM/podiumASM/default_profile/config.yaml
to customize Snakemake parameters that will be used internally by PodiumASM:
restart-times: 0
jobscript: "slurm-jobscript.sh"
cluster: "slurm-submit.py"
cluster-status: "slurm-status.py"
max-jobs-per-second: 1
max-status-checks-per-second: 10
local-cores: 1
jobs: 200 # edit to limit the number of jobs submitted in parallel
latency-wait: 60000000
use-envmodules: true # adapt True/False for env of singularuty, but only active one possibility !
use-singularity: false # if False, please install all R packages listed in tools_config.yaml ENVMODULE/R
rerun-incomplete: true
printshellcmds: true
2. Adapting cluster_config.yaml
In the cluster_config.yaml
file, you can manage HPC resources, choosing partition, memory and threads to be used by default,
or specifically, for each rule/tool depending on your HPC Job Scheduler (see there). This file generally belongs to a Snakemake profile (see above).
Warning
If more memory or threads are requested, please adapt the content of this file before running on your cluster.
A list of PodiumASM rules names can be found in the section Threading rules inside PodiumASM
Warning
For some rules in the cluster_config.yaml as rule_graph or run_get_versions, we use by default wildcards, please don’t remove it.
3. How to configure tools_path.yaml
Note
About versions of tools, the user can choose themself what version of tools to use with modules.
In the tools_path
file, you can find one section: ENVMODULES. In order to fill it correctly, you have 1 options:
Use only ENVMODULES: in this case, fill this section with the modules available on your cluster (here is an example):
MODULES:
"BUSCO": "busco/5.1.2"
"PYTHON3": "python/3.7" # Python3 >=3.6 with require libraries
"BWA": "bwa/0.7.17" # for make the mapping
"SAMTOOLS": "samtools/1.15.1"
"QUAST": "quast/5.0.2"
"MINIMAP2": "minimap2"
"SNIFFLES": "python/3.7"
"REPEATMASKER": "repeatmasker/4.1.2.p1"
And more …
Threading rules inside PodiumASM
Please find here the rules names found in PodiumASM code. It is recommended to set threads using the snakemake command when running on a single machine, or in a cluster configuration file to manage cluster resources through the job scheduler. This would save users a painful exploration of the snakefiles of PodiumASM.
rename_contigs
busco
busco_figure
bwa_index
bwa_mem_sort_bam
samtools_index_illumina
samtools_idxstats
merge_idxstats
samtools_depth
samtools_depth_to_csv
merge_samtools_depth_stats
quast_full_contigs
minimap2
samtools_index
sniffles
variant_per_contig
align_assembly
coverage
repeatmodeler
repeatmasker
remove_contigs
mummer
assemblytics
tapestry
genome_stats
report_stats_contig
finale