Requirements

PodiumASM requires Python 3.7+ and R 4.0+.

PodiumASM is developed to work on an HPC distributed cluster.


Install PodiumASM PyPI package

First, install the PodiumASM python package with pip.

git clone https://github.com/thdurand4/PodiumASM.git
cd PodiumASM
python3 -m pip install -e .
podiumASM --help

Now, follow this documentation according to what you want, local or HPC mode.


Steps for LOCAL installation

Install PodiumASM in a local (single machine) mode using podiumASM install_local command line.

install_local

Run installation for running in local computer.

The process downloading singularity images automatically.

podiumASM install_local [OPTIONS]

Options

--bash_completion, --no-bash_completion

Allow bash completion of podiumASM commands on the bashrc file

Default:

True


Steps for HPC distributed cluster installation

PodiumASM uses any available snakemake profiles to ease cluster installation and resources management. Run the command podiumASM install_cluster to install on a HPC cluster. We tried to make cluster installation as easy as possible, but it is somehow necessary to adapt a few files according to your cluster environment.

install_cluster

Run installation of tool for HPC cluster

podiumASM install_cluster [OPTIONS]

Options

-s, --scheduler <scheduler>

Type the HPC scheduler (for the moment, only slurm is available ! )

Default:

slurm

Options:

slurm

-e, --env <env>

Mode for tools dependencies

Default:

modules

Options:

modules | singularity

--bash_completion, --no-bash_completion

Allow bash completion of podiumASM commands on the bashrc file

Default:

True

1. Adapt profile and cluster_config.yaml

f Now that PodiumASM is installed, it proposes default configuration files, but they can be modified. Please check and adapt these files to your own system architecture.

1. Adapt the pre-formatted f –env si`snakemake profile` to configure your cluster options. See the section 1. Snakemake profiles for details.

2. Adapt the cluster_config.yaml file to manage cluster resources such as partition, memory and threads available for each job. See the section 2. Adapting cluster_config.yaml for further details.

2. Adapt tools_path.yaml

As PodiumASM uses many tools, you must install them using env modules possibilities :

  1. Using the module load mode,

podiumASM install_cluster --help
podiumASM install_cluster --scheduler slurm --env modules

Adapt the file :file:tools_path.yaml - in YAML (Yet Another Markup Language) - format to indicate podiumASM where the different tools are installed on your cluster. See the section 3. How to configure tools_path.yaml for details.


Advance installation

1. Snakemake profiles

The Snakemake-profiles project is an open effort to create configuration profiles allowing to execute Snakemake in various computing environments (job scheduling systems as Slurm, SGE, Grid middleware, or cloud computing), and available at https://github.com/Snakemake-Profiles/doc.

In order to run PodiumASM on HPC cluster, we take advantages of profiles.

Quickly, see here an example of the Snakemake SLURM profile we used for the French national bioinformatics infrastructure at IFB.

More info about profiles can be found here https://github.com/Snakemake-Profiles/slurm#quickstart.

Preparing the profile’s config.yaml file

Once your basic profile is created, to finalize it, modify as necessary the podiumASM/podiumASM/default_profile/config.yaml to customize Snakemake parameters that will be used internally by PodiumASM:

restart-times: 0
jobscript: "slurm-jobscript.sh"
cluster: "slurm-submit.py"
cluster-status: "slurm-status.py"
max-jobs-per-second: 1
max-status-checks-per-second: 10
local-cores: 1
jobs: 200                   # edit to limit the number of jobs submitted in parallel
latency-wait: 60000000
use-envmodules: true        # adapt True/False for env of singularuty, but only active one possibility !
use-singularity: false      # if False, please install all R packages listed in tools_config.yaml ENVMODULE/R
rerun-incomplete: true
printshellcmds: true

2. Adapting cluster_config.yaml

In the cluster_config.yaml file, you can manage HPC resources, choosing partition, memory and threads to be used by default, or specifically, for each rule/tool depending on your HPC Job Scheduler (see there). This file generally belongs to a Snakemake profile (see above).

Warning

If more memory or threads are requested, please adapt the content of this file before running on your cluster.

A list of PodiumASM rules names can be found in the section Threading rules inside PodiumASM

Warning

For some rules in the cluster_config.yaml as rule_graph or run_get_versions, we use by default wildcards, please don’t remove it.

3. How to configure tools_path.yaml

Note

About versions of tools, the user can choose themself what version of tools to use with modules.

In the tools_path file, you can find one section: ENVMODULES. In order to fill it correctly, you have 1 options:

  1. Use only ENVMODULES: in this case, fill this section with the modules available on your cluster (here is an example):

MODULES:
    "BUSCO": "busco/5.1.2"
    "PYTHON3": "python/3.7"                    # Python3 >=3.6 with require libraries
    "BWA": "bwa/0.7.17"                         # for make the mapping
    "SAMTOOLS": "samtools/1.15.1"
    "QUAST": "quast/5.0.2"
    "MINIMAP2": "minimap2"
    "SNIFFLES": "python/3.7"
    "REPEATMASKER": "repeatmasker/4.1.2.p1"

And more …

Threading rules inside PodiumASM

Please find here the rules names found in PodiumASM code. It is recommended to set threads using the snakemake command when running on a single machine, or in a cluster configuration file to manage cluster resources through the job scheduler. This would save users a painful exploration of the snakefiles of PodiumASM.

rename_contigs
busco
busco_figure
bwa_index
bwa_mem_sort_bam
samtools_index_illumina
samtools_idxstats
merge_idxstats
samtools_depth
samtools_depth_to_csv
merge_samtools_depth_stats
quast_full_contigs
minimap2
samtools_index
sniffles
variant_per_contig
align_assembly
coverage
repeatmodeler
repeatmasker
remove_contigs
mummer
assemblytics
tapestry
genome_stats
report_stats_contig
finale