Chapter 5 MS2-Annotation

This chapter consists of:

  1. Submit job of CMN and jobs based on CMN on GNPS (involved uploading files to the server of GNPS)

  2. Submit job of FBMN and jobs based on FBMN on GNPS (involved uploading files to the server of GNPS)

  3. Download all results from all submitted jobs and merge into a large table

  4. Compare MS2-experimental data with our in-house library

  5. Merge the results obtained from each GNPS-job and library-match into one table

5.1 MS2-inhouse-library v.s. experimental data

The script 1c.Pipeline_exp_vs_lib.R is available under the folder of J:\CBMR\SUN-CBMR-Metabolomics\Workflow\Script\modules\MS2_Annotation

5.1.1 Source paths on the server

source("H:/From_SUND/Scripts/utils/utils_MS2_set_up/set_up_paths.R")

5.1.2 Provide polarity (upper case) πŸ•΅

polarity <- "POS" # "POS" "NEG"

5.1.3 Provide path of preprocessed MS2-spectra file πŸ•΅

It is the absolute path by including the file name as the format of *.mgf

path_mgf_expMS2 <- "H:/From_SUND/Scripts/outputs/exp_MS2_POS.mgf"

5.1.4 provide the thresholds πŸ•΅

They are for m/z and similarity score which is used for cutting off the matches with the scores lower than the threshold

mz_threshold <- 0.01
cor_threshold <- 0.7

5.1.5 Provide the path of storing output πŸ•΅

path_outputs <- "H:/From_SUND/Scripts/outputs"

5.1.6 (Optional) Provide the method for comparison ✈️

The default method is cp_method <- "cosine" you could also try other methods for comparison β€œnspecanglescore”, β€œnavdistscore”, or β€œneuclideanscore”
cp_method <- "cosine"

5.1.7 Conduct

if (polarity == "POS") {
  
  path_lib_MS2 <- paste0(path_MS2_lib, "/RP_lib_MS2_60_POS.mgf")
  
} else if (polarity == "NEG") {
    
  path_lib_MS2 <- paste0(path_MS2_lib, "/RP_lib_MS2_60_NEG.mgf")
}
 
source(paste0(path_func, "/lib_vs_exp_MS2.R"))

MS2_vs_exp(path_lib_MS2, path_mgf_expMS2, mz_threshold, cor_threshold, polarity, path_outputs, cp_method)

5.2 GNPS-Submit jobs based on CMN

The script 1b.Pipeline_GNPS_upload_submitjobs_based on CMN.R is available under the folder of J:\CBMR\SUN-CBMR-Metabolomics\Workflow\Script\modules\MS2_Annotation

The pipeline is for:

1.Uploading files to the server of GNPS.

2.Generating parameters for each job you would like to run.

3.Running CMN (classic molecular network) and a couple of other jobs based on CMN on GNPS.

4.Changing the parameters related to the jobs you would like to run.

5.Please pay attention to the filenames no space within each filename

5.2.1 Upload files

5.2.1.1 Source paths on the server and relevant function

source("H:/From_SUND/Scripts/utils/utils_MS2_set_up/set_up_paths.R")
source(paste0(path_func, "/files_by_FTP.R"))

5.2.1.2 Provide polarity (upper case) πŸ•΅

polarity <- "POS"

5.2.1.3 Provide project information (only provide the value you want)πŸ•΅

project_name <- ""
study <- ""
sample_type <- ""
omics <- ""
method <- ""

5.2.1.5 Provide the path of storing files without filenames πŸ•΅

To be sure only files from unique polarity for running CMN within the provided folder

masterpath <- "H:/From_SUND/Scripts/test/mzML_openms_MS2_POS"

5.2.1.6 Upload files into server (FTP) (consistent with the parameters provided above)πŸ•΅

upload_file_FTP(  masterpath, 
                  list.files(masterpath), 
                  username, 
                  password, 
                  host, 
                  project_name = project_name, 
                  polarity     = polarity, 
                  study        = study, 
                  sample_type  = sample_type, 
                  omics        = omics, 
                  method       = method
               )

5.2.2 Prerequisites before generating parameters and submitting jobs πŸ•΅

!!! Run the chunk below whenever start running any job

#Provide the path for storing the lists of both parameters and job ids  
path_outputs_params_jobids <- "H:/From_SUND/Scripts/outputs"


#The two lines below are for storing all parameters related to the workflows you will run
list_param_GNPS <- list()
list_jobid_GNPS <- list()

5.2.3 CMN

5.2.3.1 Generate parameters πŸ•΅

parameters_CMN <- list()
parameters_CMN["precursor_ion_mass_tolerance"] <- "0.02"
parameters_CMN["fragment_ion_mass_tolerance"] <- "0.02"
parameters_CMN["min_pairs_cos"] <- "0.7"
parameters_CMN["minimum_matched_fragment_ions"] <- "6"
parameters_CMN["maximum_shift"] <- "500"
parameters_CMN["network_topk"] <- "10"
parameters_CMN["minimum_cluster_size"] <- "2"
parameters_CMN["run_mscluster"] <- "on" 
parameters_CMN["maximum_connected_component_size"] <- "100"
parameters_CMN["library_search_min_matched_peaks"] <- "6"
parameters_CMN["score_threshold"] <- "0.7"
parameters_CMN["search_analogs"] <- "0" 
parameters_CMN["maximum_analog_search_mass_difference"] <- "100" 
parameters_CMN["filter_below_std_dev"] <- "0.0"
parameters_CMN["minimum_peak_intensity"] <- "0.0"
parameters_CMN["filter_precursor_window"] <- "1"
parameters_CMN["filter_library"] <- "1" 
parameters_CMN["filter_peaks_10_50Da_window"] <- "1" 
parameters_CMN["filter_spectra_from_G6_as_blanks_before_networking"] <- "0" 
parameters_CMN["find_related_datasets"] <- "0" 
parameters_CMN["create_cluster_buckets_and_qiime2_biom_pcoa_plots_output"] <- "1" 
parameters_CMN["create_ili_mapping_output"] <- "0"

5.2.3.2 Submit CMN

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.2.4 Dereplicator

5.2.4.1 Generate parameters πŸ•΅

parameters_CMN_Dereplicator <- list()
parameters_CMN_Dereplicator["precursor_ion_mass_tolerance"] <- "0.02"
parameters_CMN_Dereplicator["search_analogs"] <- "on" # if you do not select, just leave it empty like ""
parameters_CMN_Dereplicator["fragment_ion_mass_tolerance"] <- "0.02"
parameters_CMN_Dereplicator["pnp_database"] <- "pnpdatabase" # "dmisam" for "Extended", "combined" for "Regular"
parameters_CMN_Dereplicator["max_charge"] <- "2"
parameters_CMN_Dereplicator["accurate_p_values"] <- "on" # leave it empty, like "", if you do not select it
parameters_CMN_Dereplicator["min_number_of_AA"] <- "5"
parameters_CMN_Dereplicator["max_isotopic_shift"] <- "0" # other options are "1", "2"
parameters_CMN_Dereplicator["adducts_Na"] <- "" # if select, give the value like "on"
parameters_CMN_Dereplicator["adducts_K"] <- ""  # if select, give the value like "on"
parameters_CMN_Dereplicator["max_allowed_modification_mass"] <- "150" 
parameters_CMN_Dereplicator["min_matched_peaks_with_known_compounds"] <- "5"

5.2.4.2 Submit Dereplicator

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.2.5 Dereplicator plus

5.2.5.1 Generate parameters πŸ•΅

parameters_CMN_Dereplicator_plus <- list()
parameters_CMN_Dereplicator_plus["precursor_ion_mass_tolerance"] <- "0.02"
parameters_CMN_Dereplicator_plus["fragment_ion_mass_tolerance"] <- "0.02"
parameters_CMN_Dereplicator_plus["max_charge"] <- "2" # other options are 1 and 3
parameters_CMN_Dereplicator_plus["min_score"] <- "12"

5.2.5.2 Submit Dereplicator plus

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.2.6 NAP

5.2.6.1 Generate parameters πŸ•΅

*parameters are different from different polarities

parameters_CMN_NAP <- list()
parameters_CMN_NAP["number_of_a_cluster_index"] <- "0"
parameters_CMN_NAP["cosine_value_to_subselect_inside_a_cluster"] <- "0.7" 
parameters_CMN_NAP["n_first_candidates_for_consensus_score"] <- "10"
parameters_CMN_NAP["use_fusion_result_for_consensus"] <- "on" # if deselect, ""
parameters_CMN_NAP["accuracy_for_exact_mass_candidate_search_ppm"] <- "15"
parameters_CMN_NAP["structure_database"] <- "GNPS,HMDB,SUPNAT,NPAtlas,CHEBI,DRUGBANK,FooDB" 
parameters_CMN_NAP["maximum_number_of_candidate_structures_in_the_graph"] <- "10"

Be careful the inputs of parameters depending on the polarity, so only change the inputs related to the polarity you are working on

if (polarity == "POS") {
  
  parameters_CMN_NAP["acquisition_mode"] <- "Positive"
  parameters_CMN_NAP["adduct_ion_type"] <- "[M+H]" # other options are "[M]","[M+NH4]","[M+Na]","[M+K]", "[M+ACN+H]" 
  parameters_CMN_NAP["multiple_adduct_types"] <- "[M+Na]" 

  } else if (polarity == "NEG") {
    parameters_CMN_NAP["acquisition_mode"] <- "Negative" 
    parameters_CMN_NAP["adduct_ion_type"] <- "[M-H]" # other options are "[M+Cl]", "[M+FA-H]"
    parameters_CMN_NAP["multiple_adduct_types"] <- "[M+Cl]"
                
  }   

5.2.6.2 Submit NAP

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.2.7 MS2LDA

5.2.7.1 Generate parameters πŸ•΅

If the polarity is negative mode, recommend to exclude all motif by changing values from β€œyes” to β€œno”

parameters_CMN_MS2LDA <- list()
parameters_CMN_MS2LDA["bin_width"] <- "0.01" #other options are 0.005, 0.05, 0.1, 0.5
parameters_CMN_MS2LDA["number_of_lda_iterations"] <- "1000"
parameters_CMN_MS2LDA["minimum_ms2_intensity"] <- "100"
parameters_CMN_MS2LDA["lda_free_motifs"] <- "200"
parameters_CMN_MS2LDA["gnps_motif_inclusion"] <- "yes" #if deselect, "no"
parameters_CMN_MS2LDA["massbank_motif_inclusion"] <- "yes"
parameters_CMN_MS2LDA["urine_motif_inclusion"] <- "yes" 
parameters_CMN_MS2LDA["euphorbia_motif_inclusion"] <- "no"
parameters_CMN_MS2LDA["rhamnaceae_plant_motif_inclusion"] <- "no"
parameters_CMN_MS2LDA["streptomyces_and_salinisporus_motif_inclusion"] <- "no"
parameters_CMN_MS2LDA["photorhabdus_and_xenorhabdus_motif_inclusion"] <- "no" 
parameters_CMN_MS2LDA["user_motif_inclusion"] <- "None" 
parameters_CMN_MS2LDA["overlap_score_threshold"] <- "0.3"
parameters_CMN_MS2LDA["probability_value_threshold"] <- "0.1"
parameters_CMN_MS2LDA["topx_in_node"] <- "5" 

5.2.7.2 Submit MS2LDA

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.2.8 MNE (MolNetEnhancer)

5.2.8.1 Generate parameters πŸ•΅

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

parameters_CMN_MNE <- list()
parameters_CMN_MNE["enter_varquest_id"] <- "None"    # leave it as "None" if this id is not available 

parameters_CMN_MNE["enter_nap_id"] <- list_jobid_GNPS$CMN_NAP_task_id  
parameters_CMN_MNE["enter_gnps_task_id"] <- list_jobid_GNPS$CMN_task_id
parameters_CMN_MNE["enter_Dereplicator_id"] <- list_jobid_GNPS$CMN_Dereplicator_task_id
parameters_CMN_MNE["enter_ms2lda_job_id"] <- list_jobid_GNPS$CMN_MS2LDA_task_id  #Leave it as "None" if this id is not available

5.2.8.2 Submit MNE

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.2.9 Merge network polarity

5.2.9.1 Generate parameters πŸ•΅

If it is CMN-based network, the unit of tolerance of rt is second

list_param_jobid_MNP <- list()
source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))


parameters_CMN_MNP <- list()
parameters_CMN_MNP["enter_gnps_positive_network_task_id"] <- list_jobid_MN_both$CMN_task_id_POS    
parameters_CMN_MNP["enter_gnps_negative_network_task_id"] <- list_jobid_MN_both$CMN_task_id_NEG    
parameters_CMN_MNP["enter_a_rt_tolerance_for_aligning_masses_between_two_runs"] <- 10
parameters_CMN_MNP["enter_a_ppm_tolerance_for_aligning_masses_between_two_runs"] <- 30 

5.2.9.2 Merge polarities

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.3 GNPS-Submit jobs based on FBMN

You only need to prepare a reprocessed MS2-spectra as the format of .mgf and the corresponding quantification table as the format of .txt or .csv

1. Not recommend to preprocess MS2 data by metaboscape. Because 1) Regarding .mgf file, the output is average MS2 spectra across multiple CEs; 2) Regarding .csv file, m/z is not correct

2. Change the parameters in the sections below related to the workflows you would like to run

5.3.1 Form quantification table

The script 0b.Pipeline_quan_table.R is available under the folder of J:\CBMR\SUN-CBMR-Metabolomics\Workflow\Script\modules\MS2_Annotation

5.3.1.1 Source paths on the server

source("H:/From_SUND/Scripts/utils/utils_MS2_set_up/set_up_paths.R")

5.3.1.2 Provide the polarity πŸ•΅

polarity <- "POS"

5.3.1.3 Path of storing outputs πŸ•΅

path_output <- "H:/From_SUND/Scripts/outputs"

5.3.1.4 Path of extracted MS2 spectra πŸ•΅

path_rds_expMS2 <- "H:/From_SUND/Scripts/outputs/exp_MS2_POS.rds" 

5.3.1.5 Path of object after conducting XCMS πŸ•΅

path_rds_xcms_XCMSnExp <- "H:/From_SUND/Scripts/outputs/xcms_XCMSnExp_POS.rds" 

5.3.1.6 (Optional) Path of object CAMERA (if conduct CAMERA) ✈️

If you have run CAMERA, please provide the path, otherwise, skip running the line below

path_rds_cam_xsAnnotate <- "H:/From_SUND/Scripts/outputs/cam_xsAnnotate_POS.rds" 

5.3.1.7 Conduct to get quantification table for FBMN on GNPS

if (exists("path_rds_cam_xsAnnotate")) {
  
  #Type of intensity value to extract from object of xcms_XCMSnExp
  feature_int <- "into"
     
}
   
   
source(paste0(path_utils_annotation, "/quan_table.R"))

5.3.2 Upload files, generate parameters, and submit jobs

The script 1b.Pipeline_GNPS_upload_submitjobs_based on FBMN.R is available under the folder of J:\CBMR\SUN-CBMR-Metabolomics\Workflow\Script\modules\MS2_Annotation

The pipeline is for:

1.Uploading files to the server of GNPS.

2.Generating parameters for each job you would like to run.

3.Running FBMN (feature based molecular network) and a couple of other jobs based on FBMN on GNPS.

4.Changing the parameters related to the jobs you would like to run.

5.3.2.1 Upload files

5.3.2.1.1 Source paths on the server and relevant function
source("H:/From_SUND/Scripts/utils/utils_MS2_set_up/set_up_paths.R")
source(paste0(path_func, "/files_by_FTP.R"))
5.3.2.1.2 Provide polarity (upper case) πŸ•΅
polarity <- "POS"
5.3.2.1.3 Provide project information πŸ•΅
project_name <- "workflow_test" 
study <- "1"            #what ordinal number comes current samples? e.g. "1", or "2",...
sample_type <- "serum"
omics <- "metabolomics"
method <- "RP"
5.3.2.1.5 Provide the path of storing files without filenames πŸ•΅
masterpath <- "H:/From_SUND/Scripts/outputs"
5.3.2.1.6 Provide the filenames πŸ•΅

.txt or .csvof quantification table and .mgf file of MS2 spectra

filenames <- c("quantable_POS.txt", "exp_MS2_POS.mgf")
5.3.2.1.7 Provide method you used for preprocessing the MS2-spectra data πŸ•΅
prep_method <- "XCMS3" #other options are illustrated within the function in the file of "submit_jobs_to_GNPS.R"
5.3.2.1.8 Upload files into server (FTP)
upload_file_FTP(  masterpath, 
                  filenames, 
                  username, 
                  password, 
                  host, 
                  project_name,
                  polarity, 
                  study       = study, 
                  sample_type = sample_type, 
                  omics       = omics, 
                  method      = method
               )

5.3.2.2 Prerequisites before generating parameters and submitting jobs πŸ•΅

#Provide the path for storing the lists of both parameters and job ids  
path_outputs_params_jobids <- "H:/From_SUND/Scripts/outputs"


#The two lines below are for storing all parameters related to the workflows you will run
list_param_GNPS <- list()
list_jobid_GNPS <- list()

5.3.2.3 FBMN

5.3.2.3.1 Generate parameters πŸ•΅
parameters_FBMN <- list()
parameters_FBMN["precursor_ion_mass_tolerance"] <- "0.02"
parameters_FBMN["fragment_ion_mass_tolerance"] <- "0.02"
parameters_FBMN["min_pairs_cos"] <- "0.7"
parameters_FBMN["minimum_matched_fragment_ions"] <- "6"
parameters_FBMN["maximum_shift"] <- "500"
parameters_FBMN["network_topk"] <- "10"
parameters_FBMN["maximum_connected_component_size"] <- "100"
parameters_FBMN["library_search_min_matched_peaks"] <- "6"
parameters_FBMN["score_threshold"] <- "0.7"
parameters_FBMN["search_analogs"] <- "0" # don't search/ "1" search
parameters_FBMN["maximum_analog_search_mass_difference"] <- "100" # it only works once select the search analog
parameters_FBMN["top_results_to_report_per_query"] <- "1"
parameters_FBMN["minimum_peak_intensity"] <- "0.0"
parameters_FBMN["filter_precursor_window"] <- "1" # filter / "0" don't filter
parameters_FBMN["filter_library"] <- "1" # filter / "0" don't filter
parameters_FBMN["filter_peaks_10_50Da_window"] <- "1" # filter / "0" don't filter
parameters_FBMN["normalization_per_file"] <- "None" # represent no normalization, the other option is "RowSum"
parameters_FBMN["aggregation_method_for_peak_abundances_per_group"] <- "Mean" # another option is "Sum"
parameters_FBMN["pcoa_distance_metric"] <- "cosine" # other options are "braycurtis", "euclidean", "jaccard"
parameters_FBMN["metadata_column_to_compare"] <- "None"
parameters_FBMN["run_stats_and_plots"] <- "No" # other option is "Yes"
parameters_FBMN["metadata_field_to_compare_1"] <- "None"
parameters_FBMN["metadata_field_to_compare_2"] <- "None"
parameters_FBMN["metadata_column_to_facet"] <- "None"
parameters_FBMN["run_Dereplicator"] <- "1" # "0" represent don't run and "1" represents run
5.3.2.3.2 Submit FBMN
source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.3.2.4 Dereplicator plus

5.3.2.4.1 Generate parameters πŸ•΅
parameters_FBMN_Dereplicator_plus <- list()
parameters_FBMN_Dereplicator_plus["precursor_ion_mass_tolerance"] <- "0.02"
parameters_FBMN_Dereplicator_plus["fragment_ion_mass_tolerance"] <- "0.02"
parameters_FBMN_Dereplicator_plus["max_charge"] <- "2" # other options are 1 and 3
parameters_FBMN_Dereplicator_plus["min_score"] <- "12"
5.3.2.4.2 Submit Dereplicator plus
source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.3.2.5 NAP

*parameters are different from different polarities

5.3.2.5.1 Generate parameters πŸ•΅
parameters_FBMN_NAP <- list()
parameters_FBMN_NAP["number_of_a_cluster_index"] <- "0"
parameters_FBMN_NAP["cosine_value_to_subselect_inside_a_cluster"] <- "0.7" 
parameters_FBMN_NAP["n_first_candidates_for_consensus_score"] <- "10"
parameters_FBMN_NAP["use_fusion_result_for_consensus"] <- "on" # if deselect, ""
parameters_FBMN_NAP["accuracy_for_exact_mass_candidate_search_ppm"] <- "15"
parameters_FBMN_NAP["structure_database"] <- "GNPS,HMDB,SUPNAT,NPAtlas,CHEBI,DRUGBANK,FooDB" 
parameters_FBMN_NAP["maximum_number_of_candidate_structures_in_the_graph"] <- "10"

Be careful the inputs of parameters depending on the polarity, so only change the inputs related to the polarity you are working on

if (polarity == "POS") {
  
  parameters_FBMN_NAP["acquisition_mode"] <- "Positive"
  parameters_FBMN_NAP["adduct_ion_type"] <- "[M+H]" # other options are "[M]", "[M+NH4]", "[M+Na]", "[M+K]", "[M+ACN+H]" 
  parameters_FBMN_NAP["multiple_adduct_types"] <- "[M+Na]" 

} else if (polarity == "NEG") {
  
  parameters_FBMN_NAP["acquisition_mode"] <- "Negative" 
  parameters_FBMN_NAP["adduct_ion_type"] <- "[M-H]" # other options are "[M+Cl]", "[M+FA-H]"
  parameters_FBMN_NAP["multiple_adduct_types"] <- "[M+Cl]"
                
}  
5.3.2.5.2 Submit NAP
source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.3.2.6 (Optional) MS2LDA ✈️

5.3.2.6.1 Generate parameters πŸ•΅
parameters_FBMN_MS2LDA <- list()
parameters_FBMN_MS2LDA["bin_width"] <- "0.01" #other options are 0.005, 0.05, 0.1, 0.5
parameters_FBMN_MS2LDA["number_of_lda_iterations"] <- "1000"
parameters_FBMN_MS2LDA["minimum_ms2_intensity"] <- "100"
parameters_FBMN_MS2LDA["lda_free_motifs"] <- "200"
parameters_FBMN_MS2LDA["gnps_motif_inclusion"] <- "yes" #if deselect, "no"
parameters_FBMN_MS2LDA["massbank_motif_inclusion"] <- "yes"
parameters_FBMN_MS2LDA["urine_motif_inclusion"] <- "yes" 
parameters_FBMN_MS2LDA["euphorbia_motif_inclusion"] <- "no"
parameters_FBMN_MS2LDA["rhamnaceae_plant_motif_inclusion"] <- "no"
parameters_FBMN_MS2LDA["streptomyces_and_salinisporus_motif_inclusion"] <- "no"
parameters_FBMN_MS2LDA["photorhabdus_and_xenorhabdus_motif_inclusion"] <- "no" 
parameters_FBMN_MS2LDA["user_motif_inclusion"] <- "None" 
parameters_FBMN_MS2LDA["overlap_score_threshold"] <- "0.3"
parameters_FBMN_MS2LDA["probability_value_threshold"] <- "0.1"
parameters_FBMN_MS2LDA["topx_in_node"] <- "5" 
5.3.2.6.2 Submit MS2LDA
source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.3.2.7 MNE (MolNetEnhancer)

5.3.2.7.1 Generate parameters πŸ•΅
source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

parameters_FBMN_MNE <- list()
parameters_FBMN_MNE["enter_varquest_id"] <- "None"    # leave it as "None" if this id is not available 


parameters_FBMN_MNE["enter_nap_id"] <- list_jobid_GNPS$FBMN_NAP_task_id
parameters_FBMN_MNE["enter_gnps_task_id"] <- list_jobid_GNPS$FBMN_task_id
parameters_FBMN_MNE["enter_ms2lda_job_id"] <- list_jobid_GNPS$FBMN_MS2LDA_task_id # or "None"

``

5.3.2.7.2 Submit MNE
source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.3.3 Merge network polarity

5.3.3.1 Generate parameters πŸ•΅

If it is FBMN-based network, the unit of tolerance of rt is minute

list_param_jobid_MNP <- list()
source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))


parameters_FBMN_MNP <- list()
parameters_FBMN_MNP["enter_gnps_positive_network_task_id"] <- list_jobid_MN_both$FBMN_task_id_POS    
parameters_FBMN_MNP["enter_gnps_negative_network_task_id"] <- list_jobid_MN_both$FBMN_task_id_NEG    
parameters_FBMN_MNP["enter_a_rt_tolerance_for_aligning_masses_between_two_runs"] <- 0.1
parameters_FBMN_MNP["enter_a_ppm_tolerance_for_aligning_masses_between_two_runs"] <- 20 

5.3.3.2 Merge polarities

source(paste0(path_utils_annotation, "/GNPS_upload_submitjobs.R"))

5.4 GNPS-Download jobs

The script 2.Pipeline_GNPS_download_jobs.R is available under the folder of J:\CBMR\SUN-CBMR-Metabolomics\Workflow\Script\modules\MS2_Annotation

5.4.1 Source paths on the server πŸ•΅

source("H:/From_SUND/Scripts/utils/utils_MS2_set_up/set_up_paths.R")

5.4.2 Provide polarity (upper case) πŸ•΅

polarity <- "POS"

5.4.3 Provide the path of storing downloaded files πŸ•΅

download_path <- "H:/From_SUND/Scripts/outputs"

5.4.4 Provide project information (only run the line with value you provide)πŸ•΅

project_name <- ""
study <- ""
sample_type <- ""
omics <- ""
method <- ""

5.4.5 Provide the absolute path of β€œlist of job ids” by including the name of that list πŸ•΅

name should start with list_jobid_GNPS

path_list <- "H:/From_SUND/Scripts/outputs/jobids_CMN_based_workflow_test_1_serum_metabolomics_RF_POS.rds"

5.4.6 Download results from the list of job ids

list_jobid_GNPS <- readRDS(path_list)
source(paste0(path_utils_annotation, "/GNPS_downloadjobs.R"))

5.4.7 Provide the absolute path of β€œlist_param_jobid_MNP” by including the name of that list πŸ•΅

name should start with list_param_jobid_MNP

path_jobid_MNP <- "H:/From_SUND/Scripts/outputs/list_param_jobid_MNP_xx_both.rds"

5.4.8 Download result of merging networks from both polarities

list_jobid_GNPS <- readRDS(path_jobid_MNP)
source(paste0(path_utils_annotation, "/GNPS_downloadjobs.R"))

5.5 Merge results from different source

5.5.1 Merging outputs from jobs have run on GNPS into a table

The script 3.Pipeline_merge_outputs_from_GNPS.R is available under the folder of J:\CBMR\SUN-CBMR-Metabolomics\Workflow\Script\modules\MS2_Annotation

5.5.1.1 Source paths on the server and relevant function

source("H:/From_SUND/Scripts/utils/utils_MS2_set_up/set_up_paths.R")
source(paste0(path_func, "/Merge_outputs_from_GNPS.R"))

5.5.1.2 Provide the path of storing files downloaded from GNPS πŸ•΅

The path should be end with gnps_outputs

path_downloaded_files <- "H:/From_SUND/Scripts/outputs/workflow_test_1_serum_metabolomics_RF/gnps_outputs"

5.5.1.3 Provide the absolute path of β€œlist of job ids” by including the name of that list πŸ•΅

name should start with list_jobid_GNPS

path_list <- "H:/From_SUND/Scripts/outputs/jobids_CMN_workflow_test_1_serum_metabolomics_RF_POS.rds"

5.5.1.4 Provide path for storing the merged result πŸ•΅

path_outputs <- "H:/From_SUND/Scripts/outputs"

5.5.1.5 Provide which method did you use for preprocessing MS2 data πŸ•΅

If you run jobs based on CMN, skip running the line below

prep_method <- "XCMS3"

5.5.1.6 Provide polarity (upper case) πŸ•΅

polarity <- "POS"

5.5.1.7 Provide what workflow based πŸ•΅

workflow_based <- "CMN" # or "FBMN"

5.5.1.8 Conduct

library(dplyr)

list_jobid <- readRDS(path_list)
workflow_all <- names(list_jobid) %>% gsub("_task_id", "",.) 
workflow <- workflow_all[grepl(workflow_based, workflow_all)] %>% strsplit("[A-Z]+_") %>% unlist() %>% unique() %>% na_if("") %>% na.omit() 

if (workflow_based == "FBMN") {
  
  Merge_outputs_from_GNPS(path_downloaded_files, prep_method, workflow, polarity, path_outputs)
  
} else if (workflow_based == "CMN") {
  
  Merge_outputs_from_GNPS(path_downloaded_files, NULL, workflow, polarity, path_outputs)
  
}

5.5.1.9 Merge library-match into GNPS-match

The script 4.Pipeline_merge_lib-match_GNPS-match.R is available under the folder of J:\CBMR\SUN-CBMR-Metabolomics\Workflow\Script\modules\MS2_Annotation

It is more precise to merge the result from library-match with the results from FBMN-based jobs on GNPS

5.5.1.9.1 Source paths on the server
source("H:/From_SUND/Scripts/utils/utils_MS2_set_up/set_up_paths.R")
5.5.1.9.2 Provide path of excel file of storing matched records and sheet names πŸ•΅

The file name should be like MS2-match-outputs.xlsx and please provide the absolute path

path_MS2_match <- "H:/From_SUND/Scripts/outputs/MS2-match-outputs.xlsx"
sheetName_GNPS <- "GNPS_summary_based_FBMN_POS"
sheetName_lib <- "MS2-lib-vs-exp_cosine_POS"
5.5.1.9.3 Provide the polarity (upper case) πŸ•΅
polarity <- "POS"
5.5.1.9.4 Provide the path for storing output πŸ•΅
path_output <- "H:/From_SUND/Scripts/outputs"
5.5.1.9.5 Conduct
source(paste0(path_utils_annotation, "/merge_lib-match_GNPS-match.R"))