Class MZTabDataLineParser<T>

  • Type Parameters:
    T - the type of domain object the parser creates.
    Direct Known Subclasses:
    SMELineParser, SMFLineParser, SMLLineParser

    public abstract class MZTabDataLineParser<T>
    extends MZTabLineParser
    This class allows the validation and loading of the data into mzTab domain objects. NOTICE: MZTabColumnFactory maintain a couple of IMZTabColumn which have internal logical position and order. In physical mzTab file, we allow user not obey this logical position organized way, and provide their date with own order. In order to distinguish them, we use physical position (a positive integer) to record the column location in mzTab file. And use PositionMapping structure to maintain the mapping between them.
    Since:
    14/02/13
    Author:
    qingwei
    See Also:
    SMLLineParser, SMFLineParser, SMELineParser
    • Constructor Detail

      • MZTabDataLineParser

        protected MZTabDataLineParser​(MZTabParserContext context,
                                      MZTabColumnFactory factory,
                                      PositionMapping positionMapping,
                                      de.isas.mztab2.model.Metadata metadata,
                                      uk.ac.ebi.pride.jmztab2.utils.errors.MZTabErrorList errorList)
        Generate a mzTab data line parser. NOTICE: MZTabColumnFactory maintain a couple of IMZTabColumn which have internal logical position and order. In physical mzTab file, we allow user not obey this logical position organized way, and provide their date with own order. In order to distinguish them, we use physical position (a positive integer) to record the column location in mzTab file. And use PositionMapping structure the maintain the mapping between them.
        Parameters:
        context - the parser context, keeping dynamic state and lookup associations.
        factory - SHOULD NOT be set to null
        positionMapping - SHOULD NOT be set to null
        metadata - SHOULD NOT be set to null
        errorList - a MZTabErrorList object.
    • Method Detail

      • parse

        public void parse​(int lineNumber,
                          String line,
                          uk.ac.ebi.pride.jmztab2.utils.errors.MZTabErrorList errorList)
                   throws uk.ac.ebi.pride.jmztab2.utils.errors.MZTabException
        We assume that user before call this method, have parse the raw line is not empty line and start with section prefix. Validate and parse the data line, if there exist errors, add them into MZTabErrorList.
        Overrides:
        parse in class MZTabLineParser
        Parameters:
        lineNumber - a int.
        line - a String object.
        errorList - a MZTabErrorList object.
        Throws:
        uk.ac.ebi.pride.jmztab2.utils.errors.MZTabException - if any.
      • getRecord

        public abstract T getRecord()
        Retrieve the data line to a type mzTab domain object.
        Returns:
        a typed mzTab domain object.
      • checkData

        protected abstract int checkData()
        Check and translate the columns into mzTab elements.
        Returns:
        a int.
      • checkData

        protected String checkData​(IMZTabColumn column,
                                   String target,
                                   boolean allowNull)
        In the table-based sections (protein, peptide, and small molecule) there MUST NOT be any empty cells. Some field not allow "null" value, for example unit_id, accession and so on. In "Complete" file, in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        allowNull - a boolean.
        Returns:
        a String object.
      • checkString

        protected String checkString​(IMZTabColumn column,
                                     String target)
        In the table-based sections (protein, peptide, and small molecule) there MUST NOT be any empty cells. Some field not allow "null" value, for example unit_id, accession and so on. In "Complete" file, in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        Returns:
        a String object.
      • checkString

        protected String checkString​(IMZTabColumn column,
                                     String target,
                                     boolean allowNull)
        In the table-based sections (protein, peptide, and small molecule) there MUST NOT be any empty cells. Some field not allow "null" value, for example unit_id, accession and so on. In "Complete" file, in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        allowNull - if true, null target values will pass the check, if false, the check will raise an error in the error list.
        Returns:
        a String object.
      • checkInteger

        protected Integer checkInteger​(IMZTabColumn column,
                                       String target)
        Check and translate target string into Integer. If parse is incorrect, throws FormatErrorType.Integer error.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        Returns:
        a Integer object.
      • checkInteger

        protected Integer checkInteger​(IMZTabColumn column,
                                       String target,
                                       boolean allowNull)
        Check and translate target string into Integer. If parse is incorrect, throws FormatErrorType.Integer error.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        allowNull - if true, null target values will pass the check, if false, the check will raise an error in the error list.
        Returns:
        a Integer object.
      • checkDouble

        protected Double checkDouble​(IMZTabColumn column,
                                     String target)
        Check and translate target string into Double. If parse is incorrect, throws FormatErrorType.Double error. NOTICE: If ratios are included and the denominator is zero, the "INF" value MUST be used. If the result leads to calculation errors (for example 0/0), this MUST be reported as "not a number" ("NaN").
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        Returns:
        a Double object.
      • checkDouble

        protected Double checkDouble​(IMZTabColumn column,
                                     String target,
                                     boolean allowNull)
        Check and translate target string into Double. If parse is incorrect, throws FormatErrorType.Double error. NOTICE: If ratios are included and the denominator is zero, the "INF" value MUST be used. If the result leads to calculation errors (for example 0/0), this MUST be reported as "not a number" ("NaN").
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        allowNull - if true, null target values will pass the check, if false, the check will raise an error in the error list.
        Returns:
        a Double object.
      • checkParamList

        protected List<de.isas.mztab2.model.Parameter> checkParamList​(IMZTabColumn column,
                                                                      String target)
        Check and translate target string into parameter list which split by '|' character.. If parse is incorrect, throws FormatErrorType.ParamList error.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkParameter

        protected de.isas.mztab2.model.Parameter checkParameter​(IMZTabColumn column,
                                                                String target,
                                                                boolean allowNull)

        checkParameter.

        Parameters:
        column - a IMZTabColumn object.
        target - a String object.
        allowNull - a boolean.
        Returns:
        a Parameter object.
      • checkStringList

        protected List<StringcheckStringList​(IMZTabColumn column,
                                               String target,
                                               char splitChar)
        Check and translate target string into parameter list which split by splitChar character.. If parse is incorrect, throws FormatErrorType.StringList error.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        splitChar - a char.
        Returns:
        a List object.
      • checkIntegerList

        protected List<IntegercheckIntegerList​(IMZTabColumn column,
                                                 String target,
                                                 char splitChar)
        Check and translate target string into integer list which split by splitChar character.. If parse is incorrect, throws FormatErrorType.StringList error.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        splitChar - a char.
        Returns:
        a List object.
      • checkIntegerList

        protected List<IntegercheckIntegerList​(IMZTabColumn column,
                                                 String target,
                                                 char splitChar,
                                                 boolean allowNull)
        Check and translate target string into integer list which split by splitChar character.. If parse is incorrect, throws FormatErrorType.StringList error.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        splitChar - a char.
        allowNull - if true, null will be treated as a valid element of the list. Otherwise, an error will be added to the error list.
        Returns:
        a List object.
      • checkDoubleList

        protected List<DoublecheckDoubleList​(IMZTabColumn column,
                                               String target)
        Check and translate target string into parameter list which split by splitChar character.. If parse is incorrect, throws FormatErrorType.StringList error.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkMZBoolean

        protected MZBoolean checkMZBoolean​(IMZTabColumn column,
                                           String target)
        Check and translate target to MZBoolean. Only "0" and "1" allow used in express Boolean (0/1). If parse is incorrect, throws FormatErrorType.MZBoolean error.
        Parameters:
        column - SHOULD NOT be set to null
        target - SHOULD NOT be empty.
        Returns:
        a MZBoolean object.
      • checkTaxid

        protected Integer checkTaxid​(IMZTabColumn column,
                                     String taxid)
        Check and translate taxid string into Integer. If exists error during parse, raise FormatErrorType.Integer error. Normally, taxid may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        taxid - SHOULD NOT be empty.
        Returns:
        a Integer object.
      • checkSearchEngine

        protected List<de.isas.mztab2.model.Parameter> checkSearchEngine​(IMZTabColumn column,
                                                                         String searchEngine)
        Check and translate searchEngine string into parameter list which split by '|' character.. If parse is incorrect, throws FormatErrorType.ParamList error. Normally, searchEngine may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        searchEngine - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkBestSearchEngineScore

        protected Double checkBestSearchEngineScore​(IMZTabColumn column,
                                                    String bestSearchEngineScore)
        The best search engine score (for this type of score) for the given peptide across all replicates reported. The type of score MUST be defined in the metadata section. If the peptide was not identified by the specified search engine, “null” MUST be reported.
        Parameters:
        column - SHOULD NOT be set to null
        bestSearchEngineScore - SHOULD NOT be empty.
        Returns:
        a Double object.
      • checkSearchEngineScore

        protected Double checkSearchEngineScore​(IMZTabColumn column,
                                                String searchEngineScore)
        The search engine score for the given peptide in the defined ms run. The type of score MUST be defined in the metadata section. If the peptide was not identified by the specified search engine “null” must be reported.
        Parameters:
        column - SHOULD NOT be set to null
        searchEngineScore - SHOULD NOT be empty.
        Returns:
        a Double object.
      • checkNumPSMs

        protected Integer checkNumPSMs​(IMZTabColumn column,
                                       String numPSMs)
        Check and translate numPSMs string into Integer. If exists error during parse, raise FormatErrorType.Integer error. Normally, numPSMs may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        numPSMs - SHOULD NOT be empty.
        Returns:
        a Integer object.
      • checkNumPeptidesDistinct

        protected Integer checkNumPeptidesDistinct​(IMZTabColumn column,
                                                   String numPeptidesDistinct)
        Check and translate numPeptidesDistinct string into Integer. If exists error during parse, raise FormatErrorType.Integer error. Normally, numPeptidesDistinct can set "null", but in "Complete" file, in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        numPeptidesDistinct - SHOULD NOT be empty.
        Returns:
        a Integer object.
      • checkNumPeptidesUnique

        protected Integer checkNumPeptidesUnique​(IMZTabColumn column,
                                                 String numPeptidesUnique)
        Check and translate numPeptidesUnique string into Integer. If exists error during parse, raise FormatErrorType.Integer error. Normally, numPeptidesUnique can set "null", but in "Complete" file, in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        numPeptidesUnique - SHOULD NOT be empty.
        Returns:
        a Integer object.
      • checkAmbiguityMembers

        protected List<StringcheckAmbiguityMembers​(IMZTabColumn column,
                                                     String ambiguityMembers)
        Check and translate target string into parameter list which split by ',' character.. If parse is incorrect, throws FormatErrorType.StringList error. Normally, ambiguityMembers may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        ambiguityMembers - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkURI

        protected String checkURI​(IMZTabColumn column,
                                  String uri)
        Checks the provided URI string.
        Parameters:
        column - SHOULD NOT be set to null
        uri - a String object, conforming to URI format.
        Returns:
        the uri as an ASCII encoded string.
      • checkSpectraRef

        protected List<de.isas.mztab2.model.SpectraRef> checkSpectraRef​(MZTabParserContext context,
                                                                        IMZTabColumn column,
                                                                        String spectraRef)
        Check and translate spectraRef string into SpectraRef list. If parse incorrect, or ms_run not defined in metadata raise FormatErrorType.SpectraRef error. Normally, spectraRef may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        spectraRef - SHOULD NOT be empty.
        context - a MZTabParserContext object.
        Returns:
        a List object.
      • checkSpectraRef

        protected List<de.isas.mztab2.model.SpectraRef> checkSpectraRef​(MZTabParserContext context,
                                                                        IMZTabColumn column,
                                                                        String spectraRef,
                                                                        boolean allowNull)
        Check and translate spectraRef string into SpectraRef list. If parse incorrect, or ms_run not defined in metadata raise FormatErrorType.SpectraRef error. Normally, spectraRef may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        spectraRef - SHOULD NOT be empty.
        context - a MZTabParserContext object.
        allowNull - if true, allow null for value. Otherwise, an error will be added to the error list.
        Returns:
        a List object.
      • checkPre

        protected String checkPre​(IMZTabColumn column,
                                  String pre)
        Check target string. Normally, pre can set "null". "null" values should only be given, if no value is available and where the specification allows for "null" explicitly."
        Parameters:
        column - SHOULD NOT be set to null
        pre - SHOULD NOT be empty.
        Returns:
        a String object.
        See Also:
        checkData(IMZTabColumn, String, boolean)
      • checkGOTerms

        protected List<StringcheckGOTerms​(IMZTabColumn column,
                                            String go_terms)
        Check and translate target string into string list which split by ',' character.. If parse is incorrect, throws FormatErrorType.StringList error. Besides, each item in list should be start with "GO:", otherwise system raise FormatErrorType.GOTermList error. Normally, go_terms may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        go_terms - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkProteinCoverage

        protected Double checkProteinCoverage​(IMZTabColumn column,
                                              String protein_coverage)
        Check and translate protein_coverage string into Double. If parse is incorrect, throws FormatErrorType.Double error. protein_coverage range should be in the [0, 1), otherwise raise LogicalErrorType.ProteinCoverage error. NOTICE: If ratios are included and the denominator is zero, the "INF" value MUST be used. If the result leads to calculation errors (for example 0/0), this MUST be reported as "not a number" ("NaN").
        Parameters:
        column - SHOULD NOT be set to null
        protein_coverage - SHOULD NOT be empty.
        Returns:
        a Double object.
      • checkSequence

        protected String checkSequence​(IMZTabColumn column,
                                       String sequence)
        Check and translate peptide sequence. 'O' and 'U' are encoded by codons that are usually interpreted as stop codons, which can not displayed in the sequence. So, if find it, system raise FormatErrorType.Sequence error.
        Parameters:
        column - SHOULD NOT be set to null
        sequence - SHOULD NOT be empty.
        Returns:
        a String object.
      • checkPSMID

        protected Integer checkPSMID​(IMZTabColumn column,
                                     String psm_id)
        Check and translate psm_id string into Integer. If exists error during parse, raise FormatErrorType.Integer error. Normally, psm_id may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        psm_id - SHOULD NOT be empty.
        Returns:
        a Integer object.
      • checkUnique

        protected MZBoolean checkUnique​(IMZTabColumn column,
                                        String unique)
        Check and translate unique to MZBoolean. Only "0" and "1" allow used in express Boolean (0/1). If parse is incorrect, throws FormatErrorType.MZBoolean error.
        Parameters:
        column - SHOULD NOT be set to null
        unique - SHOULD NOT be empty.
        Returns:
        a MZBoolean object.
      • checkCharge

        protected Integer checkCharge​(IMZTabColumn column,
                                      String charge)
        Check and translate charge string into Integer. If exists error during parse, raise FormatErrorType.Integer error. Normally, charge may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        charge - SHOULD NOT be empty.
        Returns:
        a Integer object.
      • checkMassToCharge

        protected Double checkMassToCharge​(IMZTabColumn column,
                                           String mass_to_charge)
        Check and translate mass_to_charge string into Double. If parse is incorrect, throws FormatErrorType.Double error. NOTICE: If ratios are included and the denominator is zero, the "INF" value MUST be used. If the result leads to calculation errors (for example 0/0), this MUST be reported as "not a number" ("NaN").
        Parameters:
        column - SHOULD NOT be set to null
        mass_to_charge - SHOULD NOT be empty.
        Returns:
        a Double object.
      • checkExpMassToCharge

        protected Double checkExpMassToCharge​(IMZTabColumn column,
                                              String exp_mass_to_charge)
        Check and translate exp_mass_to_charge string into Double. If parse is incorrect, throws FormatErrorType.Double error. NOTICE: If ratios are included and the denominator is zero, the "INF" value MUST be used. If the result leads to calculation errors (for example 0/0), this MUST be reported as "not a number" ("NaN").
        Parameters:
        column - SHOULD NOT be set to null
        exp_mass_to_charge - SHOULD NOT be empty.
        Returns:
        a Double object.
      • checkCalcMassToCharge

        protected Double checkCalcMassToCharge​(IMZTabColumn column,
                                               String calc_mass_to_charge)
        Check and translate calc_mass_to_charge string into Double. If parse is incorrect, throws FormatErrorType.Double error. NOTICE: If ratios are included and the denominator is zero, the "INF" value MUST be used. If the result leads to calculation errors (for example 0/0), this MUST be reported as "not a number" ("NaN").
        Parameters:
        column - SHOULD NOT be set to null
        calc_mass_to_charge - SHOULD NOT be empty.
        Returns:
        a Double object.
      • checkIdentifier

        protected List<StringcheckIdentifier​(IMZTabColumn column,
                                               String identifier)
        Check and translate identifier string into string list which split by '|' character.. If parse is incorrect, throws FormatErrorType.StringList error. Normally, identifier may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        identifier - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkSmiles

        protected List<StringcheckSmiles​(IMZTabColumn column,
                                           String smiles)
        Check and translate smiles string into parameter list which split by '|' character.. If parse is incorrect, throws FormatErrorType.StringList error. Normally, smiles may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        smiles - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkInchiKey

        protected List<StringcheckInchiKey​(IMZTabColumn column,
                                             String inchi_key)
        Check and translate inchi_key string into parameter list which split by '|' character.. If parse is incorrect, throws FormatErrorType.StringList error. Normally, inchi_key may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        inchi_key - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkRetentionTime

        protected List<DoublecheckRetentionTime​(IMZTabColumn column,
                                                  String retention_time)
        Check and translate retention_time string into Double list which split by '|' character.. If parse is incorrect, throws FormatErrorType.DoubleList error. Normally, retention_time may be set to "null"; in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        retention_time - SHOULD NOT be empty.
        Returns:
        a List object.
      • checkRetentionTimeWindow

        protected List<DoublecheckRetentionTimeWindow​(IMZTabColumn column,
                                                        String retention_time_window)
        Check and translate retention_time_window string into Double list which split by '|' character.. If parse is incorrect, throws FormatErrorType.DoubleList error. Normally, retention_time_window can set "null", but in "Complete" file, in general "null" values SHOULD not be given.
        Parameters:
        column - SHOULD NOT be set to null
        retention_time_window - SHOULD NOT be empty.
        Returns:
        a List object.