token_get_all

(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)

token_get_all β€” Π Π°Π·Π±ΠΈΠ²Π°Π΅Ρ‚ исходный ΠΊΠΎΠ΄ Π½Π° PHP-лСксСмы

ОписаниС

function token_get_all(string $code, int $flags = 0): array

Ѐункция token_get_all() Ρ€Π°Π·Π±ΠΈΡ€Π°Π΅Ρ‚ строку code Π½Π° Ρ‚ΠΎΠΊΠ΅Π½Ρ‹ языка PHP срСдствами лСксичСского сканСра Π΄Π²ΠΈΠΆΠΊΠ° Zend.

Π Π°Π·Π΄Π΅Π» «Бписок Ρ‚ΠΎΠΊΠ΅Π½ΠΎΠ² парсСра» пСрСчисляСт лСксСмы синтаксичСского Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€Π°. БимвольноС Π½Π°Π·Π²Π°Π½ΠΈΠ΅ ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ Ρ‚ΠΎΠΊΠ΅Π½Π° Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅Ρ‚ функция token_name(), которая ΠΏΡ€Π΅ΠΎΠ±Ρ€Π°Π·ΠΎΠ²Ρ‹Π²Π°Π΅Ρ‚ цСлочислСнный ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ Ρ‚ΠΎΠΊΠ΅Π½Π° Π² строковоС прСдставлСниС.

Бписок ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ²

code

Π˜ΡΡ…ΠΎΠ΄Π½Ρ‹ΠΉ PHP-ΠΊΠΎΠ΄ для Ρ€Π°Π·Π±ΠΎΡ€Π°.

flags

ΠŸΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ ΠΏΡ€ΠΈΠ½ΠΈΠΌΠ°Π΅Ρ‚ ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΠΈΠ΅ Ρ„Π»Π°Π³ΠΈ:

  • TOKEN_PARSE β€” Π’ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΡƒ синтаксиса Π²Π½ΡƒΡ‚Ρ€ΠΈ PHP-Ρ‚Π΅Π³ΠΎΠ².

Π’ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅ΠΌΡ‹Π΅ значСния

Ѐункция Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅Ρ‚ массив ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠ² лСксСм. ΠšΠ°ΠΆΠ΄ΡƒΡŽ ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½ΡƒΡŽ лСксСму функция прСдставляСт Π² массивС ΠΈΠ»ΠΈ ΠΊΠ°ΠΊ строку ΠΈΠ· ΠΎΠ΄Π½ΠΎΠ³ΠΎ символа Π½Π°ΠΏΠΎΠ΄ΠΎΠ±ΠΈΠ΅ ;, ., >, !, ΠΈΠ»ΠΈ ΠΊΠ°ΠΊ 3-элСмСнтный массив, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ содСрТит цСлочислСнный ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ лСксСмы Π² элСмСнтС с индСксом 0, строковоС содСрТимоС исходного Ρ‚ΠΎΠΊΠ΅Π½Π° Π² элСмСнтС с индСксом 1 ΠΈ Π½ΠΎΠΌΠ΅Ρ€ строки Π² элСмСнтС с индСксом 2.

ΠŸΡ€ΠΈΠΌΠ΅Ρ€Ρ‹

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ #1 ΠŸΡ€ΠΈΠΌΠ΅Ρ€ Ρ€Π°Π·Π±ΠΈΠ²ΠΊΠΈ исходного PHP-ΠΊΠΎΠ΄Π° Π½Π° Ρ‚ΠΎΠΊΠ΅Π½Ρ‹ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠ΅ΠΉ token_get_all()

<?php

$tokens
= token_get_all('<?php echo; ?>');

foreach (
$tokens as $token) {
if (
is_array($token)) {
echo
"Π‘Ρ‚Ρ€ΠΎΠΊΠ° {$token[2]}: ", token_name($token[0]), " ('{$token[1]}')", PHP_EOL;
}
}

?>

Π’Ρ‹Π²ΠΎΠ΄ ΠΏΡ€ΠΈΠ²Π΅Π΄Ρ‘Π½Π½ΠΎΠ³ΠΎ ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π° Π±ΡƒΠ΄Π΅Ρ‚ ΠΏΠΎΡ…ΠΎΠΆ Π½Π°:

Π‘Ρ‚Ρ€ΠΎΠΊΠ° 1: T_OPEN_TAG ('<?php ')
Π‘Ρ‚Ρ€ΠΎΠΊΠ° 1: T_ECHO ('echo')
Π‘Ρ‚Ρ€ΠΎΠΊΠ° 1: T_WHITESPACE (' ')
Π‘Ρ‚Ρ€ΠΎΠΊΠ° 1: T_CLOSE_TAG ('?>')

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ #2 ΠŸΡ€ΠΈΠΌΠ΅Ρ€ Π½Π΅ΠΏΡ€Π°Π²ΠΈΠ»ΡŒΠ½ΠΎΠΉ строки для Ρ€Π°Π·Π±ΠΈΠ²ΠΊΠΈ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠ΅ΠΉ token_get_all()

<?php

$tokens
= token_get_all('/* ΠΊΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ */');

foreach (
$tokens as $token) {
if (
is_array($token)) {
echo
"Π‘Ρ‚Ρ€ΠΎΠΊΠ° {$token[2]}: ", token_name($token[0]), " ('{$token[1]}')", PHP_EOL;
}
}

?>

Π’Ρ‹Π²ΠΎΠ΄ ΠΏΡ€ΠΈΠ²Π΅Π΄Ρ‘Π½Π½ΠΎΠ³ΠΎ ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π° Π±ΡƒΠ΄Π΅Ρ‚ ΠΏΠΎΡ…ΠΎΠΆ Π½Π°:

Π‘Ρ‚Ρ€ΠΎΠΊΠ° 1: T_INLINE_HTML ('/* ΠΊΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ */')
ΠžΠ±Ρ€Π°Ρ‚ΠΈΡ‚Π΅ Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅, Π² ΠΏΡ€ΠΈΠ²Π΅Π΄Ρ‘Π½Π½ΠΎΠΌ ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π΅ функция Ρ€Π°Π·ΠΎΠ±Ρ€Π°Π»Π° строку ΠΊΠ°ΠΊ Ρ‚ΠΎΠΊΠ΅Π½ T_INLINE_HTML, Π° Π½Π΅ ΠΊΠ°ΠΊ ΠΏΡ€Π΅Π΄ΠΏΠΎΠ»Π°Π³Π°Π΅ΠΌΡ‹ΠΉ T_COMMENT. Π­Ρ‚ΠΎ связано с пропуском ΠΎΡ‚ΠΊΡ€Ρ‹Π²Π°ΡŽΡ‰Π΅Π³ΠΎ PHP-Ρ‚Π΅Π³Π° Π² строкС ΠΊΠΎΠ΄Π°: функция ΠΎΠ±Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°Π΅Ρ‚ тСкст Π²Π½Π΅ PHP-Ρ‚Π΅Π³ΠΎΠ² ΠΊΠ°ΠΊ строку Π² Ρ€Π΅ΠΆΠΈΠΌΠ΅ HTML-Ρ€Π°Π·ΠΌΠ΅Ρ‚ΠΊΠΈ, Π° Π½Π΅ ΠΊΠΎΠ΄Π°.

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ #3 ΠŸΡ€ΠΈΠΌΠ΅Ρ€ Ρ€Π°Π·Π±ΠΈΠ²ΠΊΠΈ ΠΊΠΎΠ΄Π° класса с Π·Π°Ρ€Π΅Π·Π΅Ρ€Π²ΠΈΡ€ΠΎΠ²Π°Π½Π½Ρ‹ΠΌΠΈ словами Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠ΅ΠΉ token_get_all()

<?php

$source
= <<<'code'
<?php

class A
{
const PUBLIC = 1;
}
code;

$tokens = token_get_all($source, TOKEN_PARSE);

foreach (
$tokens as $token) {
if (
is_array($token)) {
echo
token_name($token[0]) , PHP_EOL;
}
}

?>

Π’Ρ‹Π²ΠΎΠ΄ ΠΏΡ€ΠΈΠ²Π΅Π΄Ρ‘Π½Π½ΠΎΠ³ΠΎ ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π° Π±ΡƒΠ΄Π΅Ρ‚ ΠΏΠΎΡ…ΠΎΠΆ Π½Π°:

T_OPEN_TAG
T_WHITESPACE
T_CLASS
T_WHITESPACE
T_STRING
T_CONST
T_WHITESPACE
T_STRING
T_LNUMBER
Π‘Π΅Π· Ρ„Π»Π°Π³Π° TOKEN_PARSE вмСсто прСдпослСднСго Ρ‚ΠΎΠΊΠ΅Π½Π° T_STRING функция Π²Π΅Ρ€Π½ΡƒΠ»Π° Π±Ρ‹ Ρ‚ΠΎΠΊΠ΅Π½ T_PUBLIC.

Π‘ΠΌΠΎΡ‚Ρ€ΠΈΡ‚Π΅ Ρ‚Π°ΠΊΠΆΠ΅

  • PhpToken::tokenize() - Π Π°Π·Π±ΠΈΡ€Π°Π΅Ρ‚ Π·Π°Π΄Π°Π½Π½ΡƒΡŽ строку, ΡΠΎΠ΄Π΅Ρ€ΠΆΠ°Ρ‰ΡƒΡŽ ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΡƒ Π½Π° PHP, Π½Π° массив ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΎΠ² PhpToken
  • token_name() - ΠŸΠΎΠ»ΡƒΡ‡Π°Π΅Ρ‚ символичСскоС Π½Π°Π·Π²Π°Π½ΠΈΠ΅ PHP-лСксСмы
οΌ‹Π”ΠΎΠ±Π°Π²ΠΈΡ‚ΡŒ

ΠŸΡ€ΠΈΠΌΠ΅Ρ‡Π°Π½ΠΈΡ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»Π΅ΠΉ 6 notes

up
5
Dennis Robinson from basnetworks dot net ΒΆ
16 years ago
I wanted to use the tokenizer functions to count source lines of code, including counting comments.  Attempting to do this with regular expressions does not work well because of situations where /* appears in a string, or other situations.  The token_get_all() function makes this task easy by detecting all the comments properly.  However, it does not tokenize newline characters.  I wrote the below set of functions to also tokenize newline characters as T_NEW_LINE.

<?php

define('T_NEW_LINE', -1);

function token_get_all_nl($source)
{
    $new_tokens = array();

    // Get the tokens
    $tokens = token_get_all($source);

    // Split newlines into their own tokens
    foreach ($tokens as $token)
    {
        $token_name = is_array($token) ? $token[0] : null;
        $token_data = is_array($token) ? $token[1] : $token;

        // Do not split encapsed strings or multiline comments
        if ($token_name == T_CONSTANT_ENCAPSED_STRING || substr($token_data, 0, 2) == '/*')
        {
            $new_tokens[] = array($token_name, $token_data);
            continue;
        }

        // Split the data up by newlines
        $split_data = preg_split('#(\r\n|\n)#', $token_data, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

        foreach ($split_data as $data)
        {
            if ($data == "\r\n" || $data == "\n")
            {
                // This is a new line token
                $new_tokens[] = array(T_NEW_LINE, $data);
            }
            else
            {
                // Add the token under the original token name
                $new_tokens[] = is_array($token) ? array($token_name, $data) : $data;
            }
        }
    }

    return $new_tokens;
}

function token_name_nl($token)
{
    if ($token === T_NEW_LINE)
    {
        return 'T_NEW_LINE';
    }

    return token_name($token);
}

?>

Example usage:

<?php

$tokens = token_get_all_nl(file_get_contents('somecode.php'));

foreach ($tokens as $token)
{
    if (is_array($token))
    {
        echo (token_name_nl($token[0]) . ': "' . $token[1] . '"<br />');
    }
    else
    {
        echo ('"' . $token . '"<br />');
    }
}

?>

I'm sure you can figure out how to count the lines of code, and lines of comments with these functions.  This was a huge improvement on my previous attempt at counting lines of code with regular expressions.  I hope this helps someone, as many of the user contributed examples on this website have helped me in the past.
up
4
gomodo at free dot fr ΒΆ
16 years ago
Yes, some problems (On WAMP, PHP 5.3.0 ) with get_token_all() 

1 : bug line numbers
 Since PHP 5.2.2 token_get_all()  should return Line numbers in element 2..
.. but for instance (5.3.0 on WAMP), it work perfectly only with PHP code (not HMTL miwed), but if you have some T_INLINE_HTML detected by token_get_all() ,  sometimes you find wrongs line numbers  (return next line)... :(

2: bug warning message can impact loops
Warning with php code uncompleted (ex : php code line by line) :
for example if a comment tag is not closed  token_get_all()  can block loops on this  warning :
Warning: Unterminated comment starting line

This problem seem not occur in CLI mod (php command line), but only in web mod.

Waiting more stability, used token_get_all()  only on PHP code (not HMTL miwed) :
First extract entirely PHP code (with open et close php tag), 
Second use token_get_all()  on the pure PHP code.

3 : Why there not function to extract PHP code (to extract HTML, we have Tidy..)?

Waiting, I used a function :

The code at end this post :
http://www.developpez.net/forums/d786381/php/langage/
fonctions/analyser-fichier-php-token_get_all/

This function not support :
- Old notation :  "<?  ?>" and "<% %>"
- heredoc syntax 
- nowdoc syntax (since PHP 5.3.0)
up
1
Ivan Ustanin ΒΆ
7 years ago
As a caution: when using TOKEN_PARSE with an invalid php-file, one can get an error like this:
Parse error: syntax error, unexpected '__construct' (T_STRING), expecting function (T_FUNCTION) or const (T_CONST) in  on line 15
Notice the missing filename as this function accepts a string, not a filename and thus has no idea of the latter.
However an exception would be more appreciated.
up
1
Theriault ΒΆ
10 years ago
The T_OPEN_TAG token will include the first trailing newline (\r, \n, or \r\n), tab (\t), or space. Any additional space after this token will be in a T_WHITESPACE token.

The T_CLOSE_TAG token will include the first trailing newline (\r, \n, or \r\n; as described here http://php.net/manual/en/language.basic-syntax.instruction-separation.php). Any additional space after this token will be in a T_INLINE_HTML token.
up
1
bart ΒΆ
9 years ago
Not all tokens are returned as an array. The rule appears to be that if a token is not variable, but instead it is one particular constant string, it is returned as a string instead. You don't get a line number. This is the case for braces( "{", "}"), parentheses ("(", ")"), brackets ("[", "]"), comma (","), semi-colon (";"), and a whole slew of operator signs ("!", "=", "+", "*", "/", ".", "+=", ...).