Interface IScanner


  • public interface IScanner

    Caveat: With the introduction of "restricted keywords" in Java 9 it is impossible to classify a token without the help of a parser. For that reason, this interface is not suitable for scanning a modular compilation unit ("module-info.java"). It is the client's responsibility to pass only source from ordinary compilation units. For lack of a file name the scanner cannot check this.

    Definition of a Java scanner, as returned by the ToolFactory. The scanner is responsible for tokenizing a given source, providing information about the nature of the token read, its positions and source equivalent.

    When the scanner has finished tokenizing, it answers an EOF token ( ITerminalSymbols#TokenNameEOF.

    When encountering lexical errors, an InvalidInputException is thrown.

    Since:
    2.0
    See Also:
    ToolFactory, ITerminalSymbols
    Restriction:
    This interface is not intended to be implemented by clients.
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      int getCurrentTokenEndPosition()
      Answers the ending position of the current token inside the original source.
      char[] getCurrentTokenSource()
      Answers the current identifier source, after unicode escape sequences have been translated into unicode characters.
      int getCurrentTokenStartPosition()
      Answers the starting position of the current token inside the original source.
      int getLineEnd​(int lineNumber)
      Answers the ending position of a given line number.
      int[] getLineEnds()
      Answers an array of the ending positions of the lines encountered so far.
      int getLineNumber​(int charPosition)
      Answers a 1-based line number using the lines which have been encountered so far.
      int getLineStart​(int lineNumber)
      Answers the starting position of a given line number.
      int getNextToken()
      Read the next token in the source, and answers its ID as specified by ITerminalSymbols.
      char[] getRawTokenSource()
      Answers the current identifier source, before unicode escape sequences have been translated into unicode characters.
      char[] getSource()
      Answers the original source being processed (not a copy of it).
      void resetTo​(int startPosition, int endPosition)
      Reposition the scanner on some portion of the original source.
      void setSource​(char[] source)
      Set the scanner source to process.
    • Method Detail

      • getCurrentTokenSource

        char[] getCurrentTokenSource()
        Answers the current identifier source, after unicode escape sequences have been translated into unicode characters. For example, if original source was \\u0061bc then it will answer abc.
        Returns:
        the current identifier source, after unicode escape sequences have been translated into unicode characters
      • getRawTokenSource

        char[] getRawTokenSource()
        Answers the current identifier source, before unicode escape sequences have been translated into unicode characters. For example, if original source was \\u0061bc then it will answer \\u0061bc.
        Returns:
        the current identifier source, before unicode escape sequences have been translated into unicode characters
        Since:
        2.1
      • getCurrentTokenStartPosition

        int getCurrentTokenStartPosition()
        Answers the starting position of the current token inside the original source. This position is zero-based and inclusive. It corresponds to the position of the first character which is part of this token. If this character was a unicode escape sequence, it points at the first character of this sequence.
        Returns:
        the starting position of the current token inside the original source
      • getCurrentTokenEndPosition

        int getCurrentTokenEndPosition()
        Answers the ending position of the current token inside the original source. This position is zero-based and inclusive. It corresponds to the position of the last character which is part of this token. If this character was a unicode escape sequence, it points at the last character of this sequence.
        Returns:
        the ending position of the current token inside the original source
      • getLineStart

        int getLineStart​(int lineNumber)
        Answers the starting position of a given line number. This line has to have been encountered already in the tokenization process (in other words, it cannot be used to compute positions of lines beyond current token). Once the entire source has been processed, it can be used without any limit. Line starting positions are zero-based, and start immediately after the previous line separator (if any).
        Parameters:
        lineNumber - the given line number
        Returns:
        the starting position of a given line number
      • getLineEnd

        int getLineEnd​(int lineNumber)
        Answers the ending position of a given line number. This line has to have been encountered already in the tokenization process (in other words, it cannot be used to compute positions of lines beyond current token). Once the entire source has been processed, it can be used without any limit. Line ending positions are zero-based, and correspond to the last character of the line separator (in case multi-character line separators).
        Parameters:
        lineNumber - the given line number
        Returns:
        the ending position of a given line number
      • getLineEnds

        int[] getLineEnds()
        Answers an array of the ending positions of the lines encountered so far. Line ending positions are zero-based, and correspond to the last character of the line separator (in case multi-character line separators).
        Returns:
        an array of the ending positions of the lines encountered so far
      • getLineNumber

        int getLineNumber​(int charPosition)
        Answers a 1-based line number using the lines which have been encountered so far. If the position is located beyond the current scanned line, then the last line number will be answered.
        Parameters:
        charPosition - the given character position
        Returns:
        a 1-based line number using the lines which have been encountered so far
      • getNextToken

        int getNextToken()
                  throws InvalidInputException
        Read the next token in the source, and answers its ID as specified by ITerminalSymbols. Note that the actual token ID values are subject to change if new keywords were added to the language (for instance, 'assert' is a keyword in 1.4).
        Returns:
        the next token
        Throws:
        InvalidInputException - in case a lexical error was detected while reading the current token
      • getSource

        char[] getSource()
        Answers the original source being processed (not a copy of it).
        Returns:
        the original source being processed
      • resetTo

        void resetTo​(int startPosition,
                     int endPosition)
        Reposition the scanner on some portion of the original source. The given endPosition is the last valid position. Beyond this position, the scanner will answer EOF tokens (ITerminalSymbols.TokenNameEOF).
        Parameters:
        startPosition - the given start position
        endPosition - the given end position
      • setSource

        void setSource​(char[] source)
        Set the scanner source to process. By default, the scanner will consider starting at the beginning of the source until it reaches its end. If the given source is null, this clears the source.
        Parameters:
        source - the given source