Reading mzTab-M files

You can read an mzTab-M file in tab separated format into a data frame structure as follows:

testfile <- system.file("testdata", c("lipidomics-example.mzTab"), package="rmzTabM")
mzTabTable <- readMzTab(testfile)
kable(head(mzTabTable[,1:3])) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "200px")
V1 V2 V3
COM Meta data section
MTD mzTab-version 2.0.0-M
MTD mzTab-ID ISAS-2018-1234
MTD description Minimal proposed sample file for identification and quantification of lipids
MTD publication[1] pubmed:29039908 | doi:10.1021/acs.analchem.7b03576
MTD cv[1]-label MS

You can extract the individual section tables from this one as follows:

mtdTable <- extractMetadata(mzTabTable)
smlTable <- extractSmallMoleculeSummary(mzTabTable)
smfTable <- extractSmallMoleculeFeatures(mzTabTable)
smeTable <- extractSmallMoleculeEvidence(mzTabTable)

knitr::kable(head(smlTable)) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "200px")
SMH SML_ID SMF_ID_REFS chemical_name database_identifier chemical_formula smiles inchi uri theoretical_neutral_mass adduct_ions reliability best_id_confidence_measure best_id_confidence_value abundance_assay[1] abundance_study_variable[1] abundance_variation_study_variable[1] opt_global_lipid_category opt_global_lipid_species opt_global_lipid_best_id_level
SML 1 1 | 2 | 3 | 4 Cer(d18:1/24:0) LM:LMSP02010012 C42H83NO3 CCCCCCCCCCCCCCCCCCCCCCCC(=O)NC@@HC@H/C=C/CCCCCCCCCCCCC InChI=1S/C42H83NO3/c1-3-5-7-9-11-13-15-17-18-19-20-21-22-23-24-26-28-30-32-34-36-38-42(46)43-40(39-44)41(45)37-35-33-31-29-27-25-16-14-12-10-8-6-4-2/h35,37,40-41,44-45H,3-34,36,38-39H2,1-2H3,(H,43,46)/b37-35+/t40-,41+/m0/s1 http://www.lipidmaps.org/data/LMSDRecord.php?LM_ID=LMSP02010012 649.6373 [M+H]+ 2 [,, qualifier ions exact mass,] 0.958 4.448784E-05 4.448784E-05 0 Sphingolipids Cer 42:1 Cer d18:1/24:0

knitr::kable(head(smfTable)) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "200px")
SFH SMF_ID SME_ID_REFS SME_ID_REF_ambiguity_code adduct_ion isotopomer exp_mass_to_charge charge retention_time_in_seconds retention_time_in_seconds_start retention_time_in_seconds_end abundance_assay[1] opt_global_quantifiers_SMF_ID_REFS
SMF 1 1 NA [M+H]1+ NA 650.6432 1 821.2341 756.0000 954.0000 4.448784E-05 3
SMF 2 2 NA NA NA 252.2677 1 821.2341 756.0000 954.0000 6.673176E-06 NA
SMF 3 3 NA NA NA 264.2689 1 821.2341 756.0000 954.0000 1.3346352E-05 NA
SMF 4 4 NA NA NA 282.2788 1 821.2341 756.0000 954.0000 9.831813E-06 NA

knitr::kable(head(smeTable)) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "200px")
SEH SME_ID evidence_input_id database_identifier chemical_formula smiles inchi chemical_name uri derivatized_form adduct_ion exp_mass_to_charge charge theoretical_mass_to_charge opt_global_mass_error spectra_ref identification_method ms_level id_confidence_measure[1] rank opt_global_qualifiers_evidence_grouping_ID_REFS
SME 1 1 LM:LMSP0501AB02 C42H83NO3 CCCCCCCCCCCCCCCCCCCCCCCC(=O)NC@@HC@H/C=C/CCCCCCCCCCCCC InChI=1S/C42H83NO3/c1-3-5-7-9-11-13-15-17-18-19-20-21-22-23-24-26-28-30-32-34-36-38-42(46)43-40(39-44)41(45)37-35-33-31-29-27-25-16-14-12-10-8-6-4-2/h35,37,40-41,44-45H,3-34,36,38-39H2,1-2H3,(H,43,46)/b37-35+/t40-,41+/m0/s1 LacCer d18:1/12:0 http://www.lipidmaps.org/data/LMSDRecord.php?LM_ID=LMSP02010012 NA [M+H]1+ 650.6432 1 650.6446 -2.1517 ms_run[1]:controllerType=0 controllerNumber=1 scan=731 [,, qualifier ions exact mass,] [MS,MS:1000511, ms level, 1] 0.958 1 2
SME 2 2 LCTR:LCTR0809812 C17H33N NA NA Cer d18:1/24:0 W’ - CHO NA NA NA 252.2677 1 252.2686 -3.5676 ms_run[1]:controllerType=0 controllerNumber=1 scan=732 [,, exact mass, ] [MS,MS:1000511, ms level, 2] 0.9780 1 NA
SME 3 2 LCTR:LCTR0871245 C18H33N NA NA Cer d18:1/24:0 W’’ NA NA NA 264.2689 1 264.2686 -1.1352 ms_run[1]:controllerType=0 controllerNumber=1 scan=732 [,, exact mass, ] [MS,MS:1000511, ms level, 2] 0.7500 1 NA
SME 4 2 LCTR:LCTR0809711 C18H35NO NA NA Cer d18:1/24:0 W’ NA NA NA 282.2788 1 282.2791 -1.0628 ms_run[1]:controllerType=0 controllerNumber=1 scan=732 [,, exact mass, ] [MS,MS:1000511, ms level, 2] 0.8760 1 NA

To turn these tables into objects, use the R6 class constructor method new():

mzTabObject <- MzTab$new()$fromDataFrame(mzTabTable)

You can then access sections like Metadata as object members:

# this is an R6 object
metadata <- mzTabObject$metadata
# these are lists
smallMoleculeSummaryEntries <- mzTabObject$smallMoleculeSummary
smallMoleculeFeatureEntries <- mzTabObject$smallMoleculeFeature
smallMoleculeEvidenceEntries <- mzTabObject$smallMoleculeEvidence

Extracting values is possible from either representation, depending on whether you prefer a tabular style or an object oriented style, however, there may be type differences:

# this is the SmallMoleculeSummary list first entry id
smallMoleculeSummaryEntries[[1]]$sml_id
#> [1] 1

# and this is the same with the data frame
as.numeric(smlTable$SML_ID)
#> [1] 1

Writing mzTab-M files

If you have an mzTab-M data frame, you can write it as follows:

utils::write.table(
    mzTabTable,
    file = file.path(tempdir(check=TRUE), "mzTabWrite1.mztab"),
    row.names = FALSE,
    col.names = FALSE,
    quote = FALSE,
    sep = "\t",
    na = "",
    fileEncoding = "UTF8"
  )

For an MzTab object, you can write to the tab separated format:

writeMzTab(mzTabObject, file.path(tempdir(check=TRUE), "mzTabWrite2.mztab"))

Or to JSON format:

writeMzTabJSON(mzTabObject, file.path(tempdir(check=TRUE), "mzTabWrite3.mztab.json"))

Validating mzTab-M files

To validate an mzTab-M file, you can access the mzTab Validator web application at https://apps.lifs-tools.org/mztabvalidator

You can set the validationLevel to one of info, warning or error. If you enable semanticValidation, CV parameters present in your file will be checked against the default recommended mapping file.

In order to validate an mzTab-M file without needing to parse it locally, use the following call, which should return an empty list:

  validatePlainFile <- system.file("testdata", c("lipidomics-example.mzTab"),package="rmzTabM")
  mzTabString <- readChar(validatePlainFile, file.info(testfile)$size)
  validationMessages2 <- validateMzTab(
    mzTabString,
    validationMode = "plain",
    validationLevel = "info",
    maxErrors = 100,
    semanticValidation = FALSE
  )
  if(length(validationMessages2)==0) {
    print("No validation messages")
  } else {
    validationMessages2
  }
#> [1] "No validation messages"

Alternatively, to run the validation with semantic checks of the used CV parameters against the default mapping file, which will give you hints on how to improve your file:

  validationMessages2 <- validateMzTab(
    mzTabString,
    validationMode = "plain",
    validationLevel = "info",
    maxErrors = 100,
    semanticValidation = TRUE
  )
#> [[Info-3010] line -1: The object "/metadata/msRun/@fragmentationMethod" accessed by msrun_fragmentation_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'msrun_fragmentation_method_may' for path '/metadata/msRun/@fragmentationMethod' and with scope '/metadata/msRun'  matching  ANY of the terms [ 'MS:1000044' with name 'dissociation method' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/msRun/@fragmentationMethod" accessed by msrun_fragmentation_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'msrun_fragmentation_method_may' for path '/metadata/msRun/@fragmentationMethod' and with scope '/metadata/msRun'  matching  ANY of the terms [ 'MS:1000044' with name 'dissociation method' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/msRun/@hashMethod" accessed by msrun_hash_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'msrun_hash_method_may' for path '/metadata/msRun/@hashMethod' and with scope '/metadata/msRun'  matching  ALL of the terms [ 'MS:1000561' with name 'data file checksum type' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/msRun/@hashMethod" accessed by msrun_hash_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'msrun_hash_method_may' for path '/metadata/msRun/@hashMethod' and with scope '/metadata/msRun'  matching  ALL of the terms [ 'MS:1000561' with name 'data file checksum type' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/instrument/@detector" accessed by instrument_detector_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'instrument_detector_may' for path '/metadata/instrument/@detector' and with scope '/metadata/instrument'  matching  ALL of the terms [ 'MS:1000026' with name 'detector type' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/instrument/@detector" accessed by instrument_detector_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'instrument_detector_may' for path '/metadata/instrument/@detector' and with scope '/metadata/instrument'  matching  ALL of the terms [ 'MS:1000026' with name 'detector type' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/sample/@species" accessed by sample_species_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_species_may' for path '/metadata/sample/@species' and with scope '/metadata/sample'  matching  ANY of the terms [ 'PRIDE:0000033' with name 'NEWT' excluding itself but including any children.'| 'NCBITaxon:1' with name 'root' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/sample/@species" accessed by sample_species_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_species_may' for path '/metadata/sample/@species' and with scope '/metadata/sample'  matching  ANY of the terms [ 'PRIDE:0000033' with name 'NEWT' excluding itself but including any children.'| 'NCBITaxon:1' with name 'root' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/sample/@tissue" accessed by sample_tissue_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_tissue_may' for path '/metadata/sample/@tissue' and with scope '/metadata/sample'  matching  ANY of the terms [ 'BTO:0000000' with name 'tissues, cell types and enzyme sources' excluding itself but including any children.'| 'PRIDE:0000442' with name 'Tissue not applicable to dataset' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/sample/@tissue" accessed by sample_tissue_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_tissue_may' for path '/metadata/sample/@tissue' and with scope '/metadata/sample'  matching  ANY of the terms [ 'BTO:0000000' with name 'tissues, cell types and enzyme sources' excluding itself but including any children.'| 'PRIDE:0000442' with name 'Tissue not applicable to dataset' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/sample/@cellType" accessed by sample_cell_type_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_cell_type_may' for path '/metadata/sample/@cellType' and with scope '/metadata/sample'  matching  ALL of the terms [ 'CL:0000000' with name 'cell' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/sample/@cellType" accessed by sample_cell_type_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_cell_type_may' for path '/metadata/sample/@cellType' and with scope '/metadata/sample'  matching  ALL of the terms [ 'CL:0000000' with name 'cell' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/sample/@disease" accessed by sample_disease_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_disease_may' for path '/metadata/sample/@disease' and with scope '/metadata/sample'  matching  ANY of the terms [ 'DOID:4' with name 'disease' excluding itself but including any children.'| 'PRIDE:0000018' with name 'Disease free' including itself or any of its children.'].
#> , type=info, category=cross_check, message=The object "/metadata/sample/@disease" accessed by sample_disease_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_disease_may' for path '/metadata/sample/@disease' and with scope '/metadata/sample'  matching  ANY of the terms [ 'DOID:4' with name 'disease' excluding itself but including any children.'| 'PRIDE:0000018' with name 'Disease free' including itself or any of its children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/@derivatizationAgent" accessed by derivatization_agent_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'derivatization_agent_may' for path '/metadata/@derivatizationAgent' and with scope '/metadata'  matching  ALL of the terms [ 'CHEBI:23367' with name 'molecular entity' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/@derivatizationAgent" accessed by derivatization_agent_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'derivatization_agent_may' for path '/metadata/@derivatizationAgent' and with scope '/metadata'  matching  ALL of the terms [ 'CHEBI:23367' with name 'molecular entity' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/smallMoleculeEvidence/@derivatizedForm" accessed by derivatized_form_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'derivatized_form_may' for path '/smallMoleculeEvidence/@derivatizedForm' and with scope '/smallMoleculeEvidence'  matching  ALL of the terms [ 'CHEBI:24433' with name 'group' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/smallMoleculeEvidence/@derivatizedForm" accessed by derivatized_form_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'derivatized_form_may' for path '/smallMoleculeEvidence/@derivatizedForm' and with scope '/smallMoleculeEvidence'  matching  ALL of the terms [ 'CHEBI:24433' with name 'group' excluding itself but including any children.']., line=-1]
  dfList<-lapply(validationMessages2, function(x) { data.frame("Category"=x$category, "Message"=x$code) })
  vmdf <- do.call("rbind", dfList)
  knitr::kable(vmdf) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "1000px")
Category Message
cross_check [Info-3010] line -1: The object “/” accessed by msrun_fragmentation_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘msrun_fragmentation_method_may’ for path ‘/’ and with scope ‘/metadata/msRun’ matching ANY of the terms [ ‘MS:1000044’ with name ‘dissociation method’ excluding itself but including any children.’].
cross_check [Info-3010] line -1: The object “/” accessed by msrun_hash_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘msrun_hash_method_may’ for path ‘/’ and with scope ‘/metadata/msRun’ matching ALL of the terms [ ‘MS:1000561’ with name ‘data file checksum type’ excluding itself but including any children.’].
cross_check [Info-3010] line -1: The object “/” accessed by instrument_detector_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘instrument_detector_may’ for path ‘/’ and with scope ‘/metadata/instrument’ matching ALL of the terms [ ‘MS:1000026’ with name ‘detector type’ excluding itself but including any children.’].
cross_check [Info-3010] line -1: The object “/” accessed by sample_species_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘sample_species_may’ for path ‘/’ and with scope ‘/metadata/sample’ matching ANY of the terms [ ‘PRIDE:0000033’ with name ‘NEWT’ excluding itself but including any children.’| ‘NCBITaxon:1’ with name ‘root’ excluding itself but including any children.’].
cross_check [Info-3010] line -1: The object “/” accessed by sample_tissue_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘sample_tissue_may’ for path ‘/’ and with scope ‘/metadata/sample’ matching ANY of the terms [ ‘BTO:0000000’ with name ‘tissues, cell types and enzyme sources’ excluding itself but including any children.’| ‘PRIDE:0000442’ with name ‘Tissue not applicable to dataset’ excluding itself but including any children.’].
cross_check [Info-3010] line -1: The object “/” accessed by sample_cell_type_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘sample_cell_type_may’ for path ‘/’ and with scope ‘/metadata/sample’ matching ALL of the terms [ ‘CL:0000000’ with name ‘cell’ excluding itself but including any children.’].
cross_check [Info-3010] line -1: The object “/” accessed by sample_disease_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘sample_disease_may’ for path ‘/’ and with scope ‘/metadata/sample’ matching ANY of the terms [ ‘DOID:4’ with name ‘disease’ excluding itself but including any children.’| ‘PRIDE:0000018’ with name ‘Disease free’ including itself or any of its children.’].
cross_check [Info-3010] line -1: The object “/” accessed by derivatization_agent_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘derivatization_agent_may’ for path ‘/’ and with scope ‘/metadata’ matching ALL of the terms [ ‘CHEBI:23367’ with name ‘molecular entity’ excluding itself but including any children.’].
cross_check [Info-3010] line -1: The object “/” accessed by derivatized_form_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘derivatized_form_may’ for path ‘/’ and with scope ‘/smallMoleculeEvidence’ matching ALL of the terms [ ‘CHEBI:24433’ with name ‘group’ excluding itself but including any children.’].