You can read an mzTab-M file in tab separated format into a data frame structure as follows:
testfile <- system.file("testdata", c("lipidomics-example.mzTab"), package="rmzTabM")
mzTabTable <- readMzTab(testfile)
kable(head(mzTabTable[,1:3])) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "200px")
V1 | V2 | V3 |
---|---|---|
COM | Meta data section | |
MTD | mzTab-version | 2.0.0-M |
MTD | mzTab-ID | ISAS-2018-1234 |
MTD | description | Minimal proposed sample file for identification and quantification of lipids |
MTD | publication[1] | pubmed:29039908 | doi:10.1021/acs.analchem.7b03576 |
MTD | cv[1]-label | MS |
You can extract the individual section tables from this one as follows:
mtdTable <- extractMetadata(mzTabTable)
smlTable <- extractSmallMoleculeSummary(mzTabTable)
smfTable <- extractSmallMoleculeFeatures(mzTabTable)
smeTable <- extractSmallMoleculeEvidence(mzTabTable)
knitr::kable(head(smlTable)) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "200px")
SMH | SML_ID | SMF_ID_REFS | chemical_name | database_identifier | chemical_formula | smiles | inchi | uri | theoretical_neutral_mass | adduct_ions | reliability | best_id_confidence_measure | best_id_confidence_value | abundance_assay[1] | abundance_study_variable[1] | abundance_variation_study_variable[1] | opt_global_lipid_category | opt_global_lipid_species | opt_global_lipid_best_id_level |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SML | 1 | 1 | 2 | 3 | 4 | Cer(d18:1/24:0) | LM:LMSP02010012 | C42H83NO3 | CCCCCCCCCCCCCCCCCCCCCCCC(=O)NC@@HC@H/C=C/CCCCCCCCCCCCC | InChI=1S/C42H83NO3/c1-3-5-7-9-11-13-15-17-18-19-20-21-22-23-24-26-28-30-32-34-36-38-42(46)43-40(39-44)41(45)37-35-33-31-29-27-25-16-14-12-10-8-6-4-2/h35,37,40-41,44-45H,3-34,36,38-39H2,1-2H3,(H,43,46)/b37-35+/t40-,41+/m0/s1 | http://www.lipidmaps.org/data/LMSDRecord.php?LM_ID=LMSP02010012 | 649.6373 | [M+H]+ | 2 | [,, qualifier ions exact mass,] | 0.958 | 4.448784E-05 | 4.448784E-05 | 0 | Sphingolipids | Cer 42:1 | Cer d18:1/24:0 |
knitr::kable(head(smfTable)) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "200px")
SFH | SMF_ID | SME_ID_REFS | SME_ID_REF_ambiguity_code | adduct_ion | isotopomer | exp_mass_to_charge | charge | retention_time_in_seconds | retention_time_in_seconds_start | retention_time_in_seconds_end | abundance_assay[1] | opt_global_quantifiers_SMF_ID_REFS |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SMF | 1 | 1 | NA | [M+H]1+ | NA | 650.6432 | 1 | 821.2341 | 756.0000 | 954.0000 | 4.448784E-05 | 3 |
SMF | 2 | 2 | NA | NA | NA | 252.2677 | 1 | 821.2341 | 756.0000 | 954.0000 | 6.673176E-06 | NA |
SMF | 3 | 3 | NA | NA | NA | 264.2689 | 1 | 821.2341 | 756.0000 | 954.0000 | 1.3346352E-05 | NA |
SMF | 4 | 4 | NA | NA | NA | 282.2788 | 1 | 821.2341 | 756.0000 | 954.0000 | 9.831813E-06 | NA |
knitr::kable(head(smeTable)) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "200px")
SEH | SME_ID | evidence_input_id | database_identifier | chemical_formula | smiles | inchi | chemical_name | uri | derivatized_form | adduct_ion | exp_mass_to_charge | charge | theoretical_mass_to_charge | opt_global_mass_error | spectra_ref | identification_method | ms_level | id_confidence_measure[1] | rank | opt_global_qualifiers_evidence_grouping_ID_REFS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SME | 1 | 1 | LM:LMSP0501AB02 | C42H83NO3 | CCCCCCCCCCCCCCCCCCCCCCCC(=O)NC@@HC@H/C=C/CCCCCCCCCCCCC | InChI=1S/C42H83NO3/c1-3-5-7-9-11-13-15-17-18-19-20-21-22-23-24-26-28-30-32-34-36-38-42(46)43-40(39-44)41(45)37-35-33-31-29-27-25-16-14-12-10-8-6-4-2/h35,37,40-41,44-45H,3-34,36,38-39H2,1-2H3,(H,43,46)/b37-35+/t40-,41+/m0/s1 | LacCer d18:1/12:0 | http://www.lipidmaps.org/data/LMSDRecord.php?LM_ID=LMSP02010012 | NA | [M+H]1+ | 650.6432 | 1 | 650.6446 | -2.1517 | ms_run[1]:controllerType=0 controllerNumber=1 scan=731 | [,, qualifier ions exact mass,] | [MS,MS:1000511, ms level, 1] | 0.958 | 1 | 2 |
SME | 2 | 2 | LCTR:LCTR0809812 | C17H33N | NA | NA | Cer d18:1/24:0 W’ - CHO | NA | NA | NA | 252.2677 | 1 | 252.2686 | -3.5676 | ms_run[1]:controllerType=0 controllerNumber=1 scan=732 | [,, exact mass, ] | [MS,MS:1000511, ms level, 2] | 0.9780 | 1 | NA |
SME | 3 | 2 | LCTR:LCTR0871245 | C18H33N | NA | NA | Cer d18:1/24:0 W’’ | NA | NA | NA | 264.2689 | 1 | 264.2686 | -1.1352 | ms_run[1]:controllerType=0 controllerNumber=1 scan=732 | [,, exact mass, ] | [MS,MS:1000511, ms level, 2] | 0.7500 | 1 | NA |
SME | 4 | 2 | LCTR:LCTR0809711 | C18H35NO | NA | NA | Cer d18:1/24:0 W’ | NA | NA | NA | 282.2788 | 1 | 282.2791 | -1.0628 | ms_run[1]:controllerType=0 controllerNumber=1 scan=732 | [,, exact mass, ] | [MS,MS:1000511, ms level, 2] | 0.8760 | 1 | NA |
To turn these tables into objects, use the R6 class constructor
method new()
:
mzTabObject <- MzTab$new()$fromDataFrame(mzTabTable)
You can then access sections like Metadata as object members:
# this is an R6 object
metadata <- mzTabObject$metadata
# these are lists
smallMoleculeSummaryEntries <- mzTabObject$smallMoleculeSummary
smallMoleculeFeatureEntries <- mzTabObject$smallMoleculeFeature
smallMoleculeEvidenceEntries <- mzTabObject$smallMoleculeEvidence
Extracting values is possible from either representation, depending on whether you prefer a tabular style or an object oriented style, however, there may be type differences:
# this is the SmallMoleculeSummary list first entry id
smallMoleculeSummaryEntries[[1]]$sml_id
#> [1] 1
# and this is the same with the data frame
as.numeric(smlTable$SML_ID)
#> [1] 1
If you have an mzTab-M data frame, you can write it as follows:
utils::write.table(
mzTabTable,
file = file.path(tempdir(check=TRUE), "mzTabWrite1.mztab"),
row.names = FALSE,
col.names = FALSE,
quote = FALSE,
sep = "\t",
na = "",
fileEncoding = "UTF8"
)
For an MzTab object, you can write to the tab separated format:
writeMzTab(mzTabObject, file.path(tempdir(check=TRUE), "mzTabWrite2.mztab"))
Or to JSON format:
writeMzTabJSON(mzTabObject, file.path(tempdir(check=TRUE), "mzTabWrite3.mztab.json"))
To validate an mzTab-M file, you can access the mzTab Validator web application at https://apps.lifs-tools.org/mztabvalidator
You can set the validationLevel
to one of
info
, warning
or error
. If you
enable semanticValidation
, CV parameters present in your
file will be checked against the default
recommended mapping file.
In order to validate an mzTab-M file without needing to parse it locally, use the following call, which should return an empty list:
validatePlainFile <- system.file("testdata", c("lipidomics-example.mzTab"),package="rmzTabM")
mzTabString <- readChar(validatePlainFile, file.info(testfile)$size)
validationMessages2 <- validateMzTab(
mzTabString,
validationMode = "plain",
validationLevel = "info",
maxErrors = 100,
semanticValidation = FALSE
)
if(length(validationMessages2)==0) {
print("No validation messages")
} else {
validationMessages2
}
#> [1] "No validation messages"
Alternatively, to run the validation with semantic checks of the used CV parameters against the default mapping file, which will give you hints on how to improve your file:
validationMessages2 <- validateMzTab(
mzTabString,
validationMode = "plain",
validationLevel = "info",
maxErrors = 100,
semanticValidation = TRUE
)
#> [[Info-3010] line -1: The object "/metadata/msRun/@fragmentationMethod" accessed by msrun_fragmentation_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'msrun_fragmentation_method_may' for path '/metadata/msRun/@fragmentationMethod' and with scope '/metadata/msRun' matching ANY of the terms [ 'MS:1000044' with name 'dissociation method' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/msRun/@fragmentationMethod" accessed by msrun_fragmentation_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'msrun_fragmentation_method_may' for path '/metadata/msRun/@fragmentationMethod' and with scope '/metadata/msRun' matching ANY of the terms [ 'MS:1000044' with name 'dissociation method' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/msRun/@hashMethod" accessed by msrun_hash_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'msrun_hash_method_may' for path '/metadata/msRun/@hashMethod' and with scope '/metadata/msRun' matching ALL of the terms [ 'MS:1000561' with name 'data file checksum type' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/msRun/@hashMethod" accessed by msrun_hash_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'msrun_hash_method_may' for path '/metadata/msRun/@hashMethod' and with scope '/metadata/msRun' matching ALL of the terms [ 'MS:1000561' with name 'data file checksum type' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/instrument/@detector" accessed by instrument_detector_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'instrument_detector_may' for path '/metadata/instrument/@detector' and with scope '/metadata/instrument' matching ALL of the terms [ 'MS:1000026' with name 'detector type' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/instrument/@detector" accessed by instrument_detector_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'instrument_detector_may' for path '/metadata/instrument/@detector' and with scope '/metadata/instrument' matching ALL of the terms [ 'MS:1000026' with name 'detector type' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/sample/@species" accessed by sample_species_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_species_may' for path '/metadata/sample/@species' and with scope '/metadata/sample' matching ANY of the terms [ 'PRIDE:0000033' with name 'NEWT' excluding itself but including any children.'| 'NCBITaxon:1' with name 'root' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/sample/@species" accessed by sample_species_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_species_may' for path '/metadata/sample/@species' and with scope '/metadata/sample' matching ANY of the terms [ 'PRIDE:0000033' with name 'NEWT' excluding itself but including any children.'| 'NCBITaxon:1' with name 'root' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/sample/@tissue" accessed by sample_tissue_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_tissue_may' for path '/metadata/sample/@tissue' and with scope '/metadata/sample' matching ANY of the terms [ 'BTO:0000000' with name 'tissues, cell types and enzyme sources' excluding itself but including any children.'| 'PRIDE:0000442' with name 'Tissue not applicable to dataset' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/sample/@tissue" accessed by sample_tissue_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_tissue_may' for path '/metadata/sample/@tissue' and with scope '/metadata/sample' matching ANY of the terms [ 'BTO:0000000' with name 'tissues, cell types and enzyme sources' excluding itself but including any children.'| 'PRIDE:0000442' with name 'Tissue not applicable to dataset' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/sample/@cellType" accessed by sample_cell_type_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_cell_type_may' for path '/metadata/sample/@cellType' and with scope '/metadata/sample' matching ALL of the terms [ 'CL:0000000' with name 'cell' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/sample/@cellType" accessed by sample_cell_type_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_cell_type_may' for path '/metadata/sample/@cellType' and with scope '/metadata/sample' matching ALL of the terms [ 'CL:0000000' with name 'cell' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/sample/@disease" accessed by sample_disease_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_disease_may' for path '/metadata/sample/@disease' and with scope '/metadata/sample' matching ANY of the terms [ 'DOID:4' with name 'disease' excluding itself but including any children.'| 'PRIDE:0000018' with name 'Disease free' including itself or any of its children.'].
#> , type=info, category=cross_check, message=The object "/metadata/sample/@disease" accessed by sample_disease_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'sample_disease_may' for path '/metadata/sample/@disease' and with scope '/metadata/sample' matching ANY of the terms [ 'DOID:4' with name 'disease' excluding itself but including any children.'| 'PRIDE:0000018' with name 'Disease free' including itself or any of its children.']., line=-1]
#> [[Info-3010] line -1: The object "/metadata/@derivatizationAgent" accessed by derivatization_agent_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'derivatization_agent_may' for path '/metadata/@derivatizationAgent' and with scope '/metadata' matching ALL of the terms [ 'CHEBI:23367' with name 'molecular entity' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/metadata/@derivatizationAgent" accessed by derivatization_agent_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'derivatization_agent_may' for path '/metadata/@derivatizationAgent' and with scope '/metadata' matching ALL of the terms [ 'CHEBI:23367' with name 'molecular entity' excluding itself but including any children.']., line=-1]
#> [[Info-3010] line -1: The object "/smallMoleculeEvidence/@derivatizedForm" accessed by derivatized_form_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'derivatized_form_may' for path '/smallMoleculeEvidence/@derivatizedForm' and with scope '/smallMoleculeEvidence' matching ALL of the terms [ 'CHEBI:24433' with name 'group' excluding itself but including any children.'].
#> , type=info, category=cross_check, message=The object "/smallMoleculeEvidence/@derivatizedForm" accessed by derivatized_form_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule 'derivatized_form_may' for path '/smallMoleculeEvidence/@derivatizedForm' and with scope '/smallMoleculeEvidence' matching ALL of the terms [ 'CHEBI:24433' with name 'group' excluding itself but including any children.']., line=-1]
dfList<-lapply(validationMessages2, function(x) { data.frame("Category"=x$category, "Message"=x$code) })
vmdf <- do.call("rbind", dfList)
knitr::kable(vmdf) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 7)) %>% scroll_box(width = "800px", height = "1000px")
Category | Message |
---|---|
cross_check | [Info-3010] line -1: The object “/metadata/msRun/@fragmentationMethod” accessed by msrun_fragmentation_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘msrun_fragmentation_method_may’ for path ‘/metadata/msRun/@fragmentationMethod’ and with scope ‘/metadata/msRun’ matching ANY of the terms [ ‘MS:1000044’ with name ‘dissociation method’ excluding itself but including any children.’]. |
cross_check | [Info-3010] line -1: The object “/metadata/msRun/@hashMethod” accessed by msrun_hash_method_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘msrun_hash_method_may’ for path ‘/metadata/msRun/@hashMethod’ and with scope ‘/metadata/msRun’ matching ALL of the terms [ ‘MS:1000561’ with name ‘data file checksum type’ excluding itself but including any children.’]. |
cross_check | [Info-3010] line -1: The object “/metadata/instrument/@detector” accessed by instrument_detector_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘instrument_detector_may’ for path ‘/metadata/instrument/@detector’ and with scope ‘/metadata/instrument’ matching ALL of the terms [ ‘MS:1000026’ with name ‘detector type’ excluding itself but including any children.’]. |
cross_check | [Info-3010] line -1: The object “/metadata/sample/@species” accessed by sample_species_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘sample_species_may’ for path ‘/metadata/sample/@species’ and with scope ‘/metadata/sample’ matching ANY of the terms [ ‘PRIDE:0000033’ with name ‘NEWT’ excluding itself but including any children.’| ‘NCBITaxon:1’ with name ‘root’ excluding itself but including any children.’]. |
cross_check | [Info-3010] line -1: The object “/metadata/sample/@tissue” accessed by sample_tissue_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘sample_tissue_may’ for path ‘/metadata/sample/@tissue’ and with scope ‘/metadata/sample’ matching ANY of the terms [ ‘BTO:0000000’ with name ‘tissues, cell types and enzyme sources’ excluding itself but including any children.’| ‘PRIDE:0000442’ with name ‘Tissue not applicable to dataset’ excluding itself but including any children.’]. |
cross_check | [Info-3010] line -1: The object “/metadata/sample/@cellType” accessed by sample_cell_type_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘sample_cell_type_may’ for path ‘/metadata/sample/@cellType’ and with scope ‘/metadata/sample’ matching ALL of the terms [ ‘CL:0000000’ with name ‘cell’ excluding itself but including any children.’]. |
cross_check | [Info-3010] line -1: The object “/metadata/sample/@disease” accessed by sample_disease_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘sample_disease_may’ for path ‘/metadata/sample/@disease’ and with scope ‘/metadata/sample’ matching ANY of the terms [ ‘DOID:4’ with name ‘disease’ excluding itself but including any children.’| ‘PRIDE:0000018’ with name ‘Disease free’ including itself or any of its children.’]. |
cross_check | [Info-3010] line -1: The object “/metadata/@derivatizationAgent” accessed by derivatization_agent_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘derivatization_agent_may’ for path ‘/metadata/@derivatizationAgent’ and with scope ‘/metadata’ matching ALL of the terms [ ‘CHEBI:23367’ with name ‘molecular entity’ excluding itself but including any children.’]. |
cross_check | [Info-3010] line -1: The object “/smallMoleculeEvidence/@derivatizedForm” accessed by derivatized_form_may is optional, but was null or empty. Allowed terms are defined in OPTIONAL rule ‘derivatized_form_may’ for path ‘/smallMoleculeEvidence/@derivatizedForm’ and with scope ‘/smallMoleculeEvidence’ matching ALL of the terms [ ‘CHEBI:24433’ with name ‘group’ excluding itself but including any children.’]. |