Imaging (fMRI) data processing with High Performance Computing (HPC) cluster

at the University of Arizona

Created and maintained by: Diheng Zhang (dihengzhang@arizona.edu)

Docs created at: Oct 11th, 2022

Last modified: [2025-04-14 Mon]

Contributors:

  • Andrea Coppola, Ph.D.
  • Dianne Patterson, Ph.D.
  • Teodora Stoica, Ph.D.

This is intended to be a documentation for starting fMRI data processing on HPC at UA. It mainly serves two purposes:

  1. Document the way I implement data (pre)processing for my doctoral dissertation and in the same time;

  2. Provide an example of one way to do that.

This documentation is not intended to be exhaustive, however, external documentations will be provided at the beginning of each step for a deeper dive if needed.

Starter resources

If you have no other previous experience with MRI data (pre-)processing, here are some external resources that can help you start:

Join Neuroimaging Workshops group (on D2L) for more imaging data processing workshops and tutorials. Usually meet on Mondays. Email Dianne Patterson (dkp@arizona.edu) to be added to the D2L site.

D2L_site

Converting your DICOM files to BIDS format

fMRIPrep is a BIDS app, meaning that it takes BIDS format data folders as input. BIDS stands for Brain Imaging Data Structure. It is recommended as a uniform structure to facilitate consistent collaboration between labs. See BIDS starter kit for more information.

Option 1: For data security purposes, it is recommended you convert your dataset with a secure local lab machine, defaces (deidentify) your dataset, and then transfer it to HPC. For documentation on how to do this locally, see Saren's note on BIDS here. For instruction about defacing your dataset locally, see here

Option 2 (highly recommended): Or, you can convert and deface your DICOM file to BIDS format with an online platform ezbids by Brainlife. ezbids is a HIPAA compliant online platform that require you upload your raw DICOM folder. Documentation of ezbids see here.

Note: After you convert your DICOM files to BIDS format, we suggest that you use the BIDS validator to check if the convertion is successful before you transfor your BIDS data to HPC.

DICOM raw data folder structure

To save processing time and prevent file detection error on ezbids.io, it is recommended that you only upload the scans that you need (T1w, resting fMRI, etc.).

A typical DICOM dataset you copy from a scanner, may looks like this (specially for Siemens scanner):

Folder PATH listing for volume RAID Imaging
Volume serial number is 88E4-9172
P:MORENO_PSILOCYBIN_20190617_112411_379000\
+---10VOLS_FMRI_3MM_2_2_0011
+---10VOLS_FMRI_3MM_SINGLEBAND_0013
+---AAHEAD_SCOUT_0001
+---AAHEAD_SCOUT_MPR_COR_0003
+---AAHEAD_SCOUT_MPR_SAG_0002
+---AAHEAD_SCOUT_MPR_TRA_0004
+---ASL_3D_TRA_FAST_0019
+---AX_WMN_MPRAGE_0021
+---BZERO_VERIFY_P-A_0018
+---DTI_64_DIRS_GRAPPA2_0015
+---DTI_64_DIRS_GRAPPA2_TENSOR_0017
+---GRE_FIELD_MAPPING_0009
+---GRE_FIELD_MAPPING_0010
+---MOCOSERIES_0006
+---MOCOSERIES_0012
+---MOCOSERIES_0014
+---PERFUSION_WEIGHTED_0020
+---RSFMRI_3MM_2_2_0005
+---SAG_3D_FLAIR_0008
+---T1_MPRAGE_SAG_ISO_0007
+---T1_MPRAGE_SAG_ISO_S7_ND_0016

For the exact meaning of each folder name, please contact the technician at the scan center. Most likely, the RSFMRI folder is your resting fMRI data, T1_MPRAGE folder is your T1w structural scan (if you have more than one T1w scan folder, be sure to check the exact number of each folder. In the example above, the T1_MORAGE_..._ND_0016 folder refers to the T1w scan without distortion correction, which is different from the standard T1w scan).

GRE_FIELD_MAPPING folders have two. The one with 84 image file (the first one) will become two images: magnitude1 and magnitude2. The fieldmap with 42 (the second one) will become the phasediff map. Ezbids can't recognize those correctly, and require you to assign the correct labels. You should get 3 images in the fmap directory from these two sets of dicoms. ezbids will allow you to indicate that these are intended for correcting the fMRI images, but you have to tell it to do that. The latest fmriprep will use them to do distortion correction even without the intendedfor (in other word IF you get the fieldmaps in the fmap directory, they should be used for correction. See this lesson on distortion correction: here

Starting with HPC@UA

Resources

Setting up your account on HPC

Here is the page for requesting an accout for HPC@UA: HPC Account Creation

If you are a PI, you will be registering a PI account, which will give you authority to sponsor individual or group access to HPC. For most of the graduate students, you will most likely requesting a sponsoered HPC account, which will be created upon approved from your sponsor (most likely your PI).

Transferring data to HPC

Note on data security: HPC storages in general is not HIPAA compliant. It is recommended to convert your raw DICOM files to BIDS format, and then deface all imaging data before you transfer the data to HPC for further pre-processing.

  • Globus: Globus is the preferred way to transfer data from your local machine to HPC. Here is a 7 min video showing how to do that: HPC 1b: Data Transfer

You might also find this section of the Neuroimaging-core webpage helpful:  Neuroimaging-core/Transferring Files

Step 1 - Register/Log in to Globus: You can log in with your UA SSO credential. Just click "Log in" on the right upper corner and follow the instruction.

Step - 2 Use Google Drive for Globus: if you want to skip the Globus Connect Personal route, Globus also works with Google Drive. As an UA student your Google Drive account (same as your UA email address) comes with unlimited storage (but soon will be 15 GB). If your data is under 15 GB in total and has been deidentified, I recommend using Google Drive for Globus file transfer. I also recommend compressing your deidentified BIDS data into a single zip file before you move it to Google Drive for Globus, so that it saves you time and increase reliability of transferring. Documentation see here. To connect your Google Drive to your Globus account, go to Collection tab and search "UA Google Drive" and go from there.

Note on decompressing your tar file: If you compressed your dataset to tar before you transfer to HPC (which is recommended), you can click on "Open in terminal" in OOD and use this command to decompress your dataset:

$tar -xvf [your dataset.tar]

See below for terminal access to the HPC.

A note on storage options

  • Google Drive: This is recommanded if your dataset has already been deidentified. Google Drive connects well with your Globus account with just a little of configuration. See above for details.

fMRI data preprocessing with fMRIPrep on HPC

Accessing files and running tasks on HPC

Option 1: You can access HPC with Open On Demand (OOD), but you need to request and gain access. See details here. OOD provided a web-based GUI for HPC file management.

You can also click the 'Open in Terminal' button on the OOD page to open up a web-based command line window.

Option 2: Use ssh.

After you are on the UA VPN (see here if you have not set it up), open a terminal and then type:

$ssh [your netid]@hpc.arizona.edu

Then you will be on the bastion host gateway.

Batch script for preprocessing with fMRIPrep with Singularity at HPC

In your Home Directory (type $cd ~ if you are not sure), you should have a bin folder with several scripts (if you don’t let me know and I’ll show you how to copy it).

One of the scripts runs fmriprep with singularity [runfmriprep.sh]. You will have to edit the .sh file with proper paths and account names.

Note: In your xdisk BIDS folder, make 2 blank directories called “derivatives” and “scratch”. In addition, outside of your BIDS folder, make a file called subjects.txt with just the numbers of the subjects you want processed. At the end of the list, press enter and leave an empty space (otherwise it won’t process your last subject).

From the OOD command line or your terminal, navigate to your data folder:

$cd ~/xdisk/[group name]/[your folder]/[BIDS folder]

Note 1: I recommend visiting /groups/dkp/BIDS/ for an example of BIDS formatted folder structure. runfmriprep.sh takes a folder that is slightly different from the BIDS offical structure. There is a chance that your BIDS folder will pass the BIDS validator but still fail the runfmriprep.sh. Make sure that your BIDS folder structure is the same as /group/dkp/BIDS/

Note 2: Make sure that you have a license.txt file under your ~/ folder. You can copy it from ~/bin/

the, run one subject with:

$sbatch --export sub=[subject id] ~/bin/runfmriprep.sh

or run all your subjects in the subjects.txt file with:

$sbatchr runfmriprep.sh subjects.txt

Go to the OOD/Jobs tab, Active jobs and keep refreshing it until you see that it started running.

YOU’RE DONE!

Inspecting your results

Option 1: Use the Interactive Desktop on OOD. You can check the results on HPC via an interactive desktop.

Option 2: Compress and move the derivative folder back to your local computer via Globus.

Array (parallel) jobs

You can setup an array script which allow you to run multiple jobs with one command. This is ideal when you have already tested out running one subject and finding the configuration that you need. You can see Dianne's documentation here.

The basic idea is to copy your edited runfmriprep.sh codes starting from ###run your code here#### to the end of the arrary.sh file and save it as a new, arrayed version of your originial runfmriprep.sh (see /groups/jallen/dihengzhang/bin/arrary_runfmriprep.sh as an example).

and then run all your subjects in the subjects.txt file with:

$sbatchr ~/bin/arrary_runfmriprep.sh subjects.txt

Setting up and running CONN with Matlab on HPC

Launching Matlab on HPC

UA HPC has a few common data analysis software installed and allow you to access them via GUI, including Matlab, Mathematica, Stata, VSCode, Jupyter Notebook and RStudio.

Go to Interactive Apps\Matlab\ and click Launch after you configure your node.

For further instruction on Matlab on HPC see Dianne's Matlab documentation here and UA HPC's offical documentation here

CONN with Matlab on HPC

Once you fire up an interactive Matlab session, add the CONN path to your working path (it should be /groups/dkp/neuroimaging/matlab).

Then, just type $conn in your matlab command line and hit enter.

You can now use CONN just like you are on a local machine!

Note: It is not recommended that you add your derivatives folder to your mat lab path. I have tried it a couple times and it always froze the process. Instead, fire up your CONN first and then select your derivatives folder within CONN.

Running your connectivity analysis with parallel processing

Setup your CONN setting correct to utilize parallel processing with Slurm

  • Step 1: Open CONN, go to Tools > HPC Options > Configuration
  • Step 2: See below for an example of the setup

CONN+Slurm-Config

Note: Complete command inside the "Command used to submit a job" sbatch --job-name=JOBLABEL --account=jallen --partition=standard --error=STDERR --output=STDOUT OPTS SCRIPT

Now you can easily parallel your processing on every step with CONN by selecting distributed processing (run on Slurm computer cluster) when you run your analysis.

CONN+Slurm-Run

Setting up FSL on HPC

It is always recommanded to use FSL from an interactive desktop (See OOD documentation).

You can choose to install FSL on your HPC account. See FSL documentation for installation instructions

Or, you can use other's installed FSL, just need to make sure that you have access to their folders and you configure your ~/.bashrc file to include:

export FSLDIR=[path to installed FSL folder]
source ${FSLDIR}/etc/fslconf/fsl.sh
export PATH=$PATH:${FSLDIR}/bin

For example, you can use Dianne's version of FSL by putting these lines into your ~/.bashrc

export FSLDIR=/groups/dkp/neuroimaging/fsl
source ${FSLDIR}/etc/fslconf/fsl.sh
export PATH=$PATH:${FSLDIR}/bin

Quality assurance analysis with MRIQC and QMTools

First, make sure that all your subjects' image run correctly with fMRIPrep and successfully generate a subject-ID.html file for each subject. Go to the end of each *.html file to see if any Errors was reported.

Individual-level QC with MRIQC and fMRIPrep

  • Check out Dianne's documentation about MRIQC and QMTOOLs here

  • Check out MRIQC's official documentations here

If you were able to run fMRIPrep with runfmriprep.sh or array_runfmriprep.sh, running MRIQC on hpc with singularity is very similar. See the fMRIPrep with Batch section. In /home/u21/dihengzhang/bin you can find examples of runmriqc.sh and array_runmriqc.sh for individual-level QC analysis.

Command to run:

sbatchr array_runmriqc_control.sh subjects.txt

Group-level QC with MRIQC and QMTools

  • See runmriqc_group.sh inside /home/u21/dihengzhang/bin for an example of the batch file to run MRIQC on the group level.

  • After you have successfully run MRIQC on the subject level, run:

sbatch runmriqc_group.sh

Comparison to an aggregated sample with QMTools

QMTools is a handy package that help you "visualize, compare, and review the image quality metrics (IQMs) produced by the MRIQC program." It also provide functions to fetch an online sample to compare your own dataset to. The easiest way to use QMTools is to use the QMTools Support.

Step 1: Clone the QMTools Support to your HPC work folder.

git clone https://github.com/hickst/qmtools-support.git qmtools

Step 2: Copy your group level MRIQC data into the qmtools/inputs folder

cp ~/Project/derivatives/mriqc/group_*.tsv ~/Project/qmtools/inputs

Step 3: Use the qmtraffic, qmfetcher and qmviolin function to visualize and compare your results.

Here is an example of generating the group level visualization and comparing it to 50 records from online, for project FED(Assuming you have done Step 1 and Step 2).

To generate the traffic light figure with qmtraffic

For BOLD:

(base) [dihengzhang@r7u13n2 qmtools]$ ./qmtraffic_hpc -v bold inputs/group_bold.tsv -r FED
(qmtraffic): Processing MRIQC group file 'inputs/group_bold.tsv' with modality 'bold'.
(qmtraffic): Produced reports in reports directory 'reports/FED'.

For T1w:

(base) [dihengzhang@r7u13n2 qmtools]$ ./qmtraffic_hpc -v T1w inputs/group_T1w.tsv -r FED
(qmtraffic): Processing MRIQC group file 'inputs/group_T1w.tsv' with modality 'T1w'.
(qmtraffic): Produced reports in reports directory 'reports/FED'.
(qmtraffic): To see the report: open 'reports/FED/T1w.html' in a browser.

To use qmfetcher to fetch 50 records from online:

(base) [dihengzhang@r7u13n2 qmtools]$ ./qmfetcher_hpc -v bold
(qmfetcher): Querying MRIQC server with modality 'bold', for 50 records.
(qmfetcher): Fetched 50 records out of None.
(qmfetcher): Saved query results to 'fetched/bold_20250415_132123-448060.tsv'.
(base) [dihengzhang@r7u13n2 qmtools]$ ./qmfetcher_hpc -v T1w
(qmfetcher): Querying MRIQC server with modality 'T1w', for 50 records.
(qmfetcher): Fetched 50 records out of None.
(qmfetcher): Saved query results to 'fetched/T1w_20250415_132154-748048.tsv'.

To use qmviolin to compare your results to the 50 record fetched from online.

(base) [dihengzhang@r7u13n2 qmtools]$ ./qmviolin_hpc -v T1w fetched/T1w_20250415_132154-748048.tsv inputs/group_T1w.tsv -r FED_T1w_violin
(qmviolin): Comparing MRIQC records with modality 'T1w'.
(qmviolin): Compared group records against fetched records.
(qmviolin): Produced violin report to 'reports/FED_T1w_violin'.
(qmviolin): To see the report: open 'reports/FED_T1w_violin/violin.html' in a browser.
(base) [dihengzhang@r7u13n2 qmtools]$ ./qmviolin_hpc -v bold fetched/bold_20250415_132123-448060.tsv inputs/group_bold.tsv -r FED_bold_violin
(qmviolin): Comparing MRIQC records with modality 'bold'.
(qmviolin): Compared group records against fetched records.
(qmviolin): Produced violin report to 'reports/FED_bold_violin'.
(qmviolin): To see the report: open 'reports/FED_bold_violin/violin.html' in a browser.

Here are examples of outcome visualizations:

The Traffic light figure: The Traffic light

The violin comparison figure: The Violin comparison