Standard Operating Procedures
Essential Reading
All users must read and comply with these procedures before using the system.
Table of Contents
1. Purpose
This SOP defines the procedures and responsibilities for authorized users of the Sapitwa HPC Cluster at the Malawi Liverpool Wellcome (MLW) Programme. It ensures users operate within acceptable norms to maintain data security, system performance, and fairness in resource utilization.
2. Scope
This SOP applies to all users accessing the Sapitwa HPC Cluster, including MLW staff, affiliated researchers, students, and collaborators granted access by MLW
3. Roles and Responsibilities
Key Stakeholders
Users
- • Submit jobs efficiently using SLURM job scheduler
- • Manage data responsibly
- • Follow security best practices
HPC Administrators
- • Maintain and monitor the system
- • Create and manage user accounts
- • Provide technical support and training
Principal Investigators (PIs)
- • Endorse user account requests
- • Oversee usage within research groups
4. Access and Accounts
Account Management
- • Access is granted through institutional SSO and MFA
- • Users must submit a formal request form approved by a PI
- • Accounts remain active for the duration of a project and are reviewed annually
- • User accounts that remain inactive for a period of 12 months may be disabled following a formal notification. To request reactivation, users must initiate the process through their PI or designated supervisor. Upon submission of a valid reactivation request by the PI, account access will be restored within 48 hours.
- • User home directories are located at
/head/NFS/$USER, where $USER is the system username.
5. Code of Conduct & Acceptable Use
Usage Policies
- • Use resources only for MLW-approved academic/research purposes
- • Do not run cryptocurrency mining, unauthorized commercial work, or system stress tests
- • Users must not share accounts
- • Respect fair use and shared resource policies
6. System Overview
Infrastructure Details
- • Hardware: Dell PowerEdge R940xa (virtualized nodes)
- • Nodes: 2 CPU nodes, 2 GPU nodes (NVIDIA A100 in 7x 10GB MIG mode)
- • OS: Rocky Linux 8
- • Job Scheduler: SLURM
- • Environment Management: Lmod and EasyBuild
- • Storage: NFS-mounted scratch and home directories on Dell PowerVault ME5024
- • Access Methods: SSH and Open OnDemand (https://sapitwa.mlw.mw)
- • Monitoring: Metrics and performance dashboards at https://metrics.mlw.mw
7. Using the System
7.1 Login Instructions
Access Methods
ssh username@sapitwa.mlw.mw
7.2 Environment Modules
Common Module Commands
List all available modules:
module avail
Show detailed module information:
module show module_name
Search for specific modules:
module spider
Search for a specific module:
module spider module_name
Load a module:
module load module_name
Unload a module:
module unload module_name
List currently loaded modules:
module list
Unload all modules:
module purge
7.3 Job Submission
SLURM Commands
Submit a job:
sbatch jobscript.sh
View job queue:
squeue
View your jobs only:
squeue -u $USER
View detailed job history:
sacct
Cancel a job:
scancel job_id
Show cluster status:
sinfo
7.4 Interactive Jobs
Interactive Session Commands
Request an interactive terminal session:
srun --pty bash
Request a session with specific resources:
srun --pty --cpus-per-task=4 --mem=8G bash
Request a GPU session:
srun --pty --gres=gpu:1 bash
Allocate resources without starting a session:
salloc
Allocate specific resources:
salloc --cpus-per-task=4 --mem=8G
7.5 Software
- • Pre-installed packages via EasyBuild and modules
- • Users may use Conda, pip or Spack in
/home/NFS/$USER. Note that $USER is a variable for username in the cluster.
8. Data Management
Storage Guidelines
- •
/head/NFS/$USER
: User's home directory (quota limited) with 50GB. Special request will be required to increase the quota. Note that $USER is a variable for the username. - •
/scratch
: High-speed storage (periodically purged) - • Use rsync, scp, or sftp for data transfers
- • No sensitive data storage allowed on HPC
Data Policies
- • Keep source data in project space
- • Use scratch space for temporary files
- • Clean up completed job files
- • Compress unused data
File Management Commands
Check your storage quotas:
quota -s
Check directory size:
du -sh directory_name
Find files larger than 100MB:
find . -type f -size +100M -exec ls -lh {} \;
9. Security and Compliance
Security Requirements
- • Follow institutional IT security policies
- • Use SSH keys, strong passwords and MFA
- • No plaintext password storage or credential sharing
- • Report suspicious activity immediately
MFA Setup
Configure your authenticator app:
-
1
Install one of the supported authenticator apps:
- • Microsoft Authenticator (recommended)
- • Google Authenticator
- • FreeOTP
-
2
Visit
https://sapitwa.mlw.mw
- 3 Follow the MFA setup instructions
10. Acknowledgement and Publications
Standardized Acknowledgement Format
"Computational resources were provided by the Sapitwa High Performance Computing (HPC) Cluster at the Malawi Liverpool Wellcome Research Programme (MLW), funded through the Wellcome core award to MLW (206545/Z/17/Z) and institutional funding support from the Liverpool School of Tropical Medicine."
Notify HPC administrators of publications for impact reporting.
11. Support and Training
Getting Help
Primary Support
- • Email: hpc-support@mlw.mw
- • Onboarding training provided for new users
- • Microsoft Teams: HPC Community Channel
- • Hours: Monday-Friday, 8:00 AM - 5:00 PM
Documentation
Emergency Support
For after-hours emergencies affecting critical research workflows, use the MS-Teams Channel.
12. Incident Reporting
Reporting Requirements
- • Report outages and failures to HPC support team
- • Security incidents must be reported to MLW IT security
13. Account Termination and Data Retention
End of Service
- • Accounts terminated upon project end or staff departure
- • Data in
/head/NFS/$USER
retained for 30 days post-termination
System Maintenance
Maintenance Schedule
- • Regular Updates: Last Monday of each month
- • Emergency Maintenance: 24-hour notice when possible
- • System Status: Check Teams channel