ОписаниС ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠ² шаблонов Perl-совмСстимых рСгулярных Π²Ρ‹Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ (PCRE)

ДоступныС PCRE-ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Ρ‹ пСрСчислСны Π½ΠΈΠΆΠ΅. Названия ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠ² Π² ΠΊΡ€ΡƒΠ³Π»Ρ‹Ρ… скобках относятся ΠΊ Π²Π½ΡƒΡ‚Ρ€Π΅Π½Π½ΠΈΠΌ ΠΈΠΌΠ΅Π½Π°ΠΌ PCRE-Π²Ρ‹Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ. Π€ΡƒΠ½ΠΊΡ†ΠΈΠΈ ΠΈΠ³Π½ΠΎΡ€ΠΈΡ€ΡƒΡŽΡ‚ ΠΏΡ€ΠΎΠ±Π΅Π»Ρ‹ ΠΈ ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Ρ‹ строк Π² ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π°Ρ…, Π΄Ρ€ΡƒΠ³ΠΈΠ΅ символы Π²Ρ‹Π·Ρ‹Π²Π°ΡŽΡ‚ ΠΎΡˆΠΈΠ±ΠΊΡƒ.

i (PCRE_CASELESS)
Π‘ этим ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠΌ символы Π² шаблонС ΡΠΎΠΎΡ‚Π²Π΅Ρ‚ΡΡ‚Π²ΡƒΡŽΡ‚ Π·Π°Π³Π»Π°Π²Π½Ρ‹ΠΌ ΠΈ строчным Π±ΡƒΠΊΠ²Π°ΠΌ.
m (PCRE_MULTILINE)
ΠœΠΎΠ΄ΡƒΠ»ΡŒ PCRE ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ ΠΎΠ±Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°Π΅Ρ‚ Π²Ρ…ΠΎΠ΄Π½ΡƒΡŽ строку ΠΊΠ°ΠΊ строку, которая состоит ΠΈΠ· ΠΎΠ΄Π½ΠΎΠΉ «строки» символов, Π΄Π°ΠΆΠ΅ Ссли строка содСрТит ΠΏΠ΅Ρ€Π΅Ρ…ΠΎΠ΄Ρ‹ Π½Π° Π½ΠΎΠ²ΡƒΡŽ строку. ΠœΠ΅Ρ‚Π°ΡΠΈΠΌΠ²ΠΎΠ» Β«Π½Π°Ρ‡Π°Π»Π° строки» β€” Β«^Β» β€” соотвСтствуСт Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Π½Π°Ρ‡Π°Π»Ρƒ строки, Ρ‚ΠΎΠ³Π΄Π° ΠΊΠ°ΠΊ мСтасимвол Β«ΠΊΠΎΠ½Ρ†Π° строки» β€” Β«$Β» β€” соотвСтствуСт Ρ‚ΠΎΠ»ΡŒΠΊΠΎ ΠΊΠΎΠ½Ρ†Ρƒ строки ΠΈΠ»ΠΈ ΠΏΠΎΠ·ΠΈΡ†ΠΈΠΈ ΠΏΠ΅Ρ€Π΅Π΄ ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄ΠΎΠΌ строки, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π·Π°Π²Π΅Ρ€ΡˆΠ°Π΅Ρ‚ тСкст, Ссли Π½Π΅ установили ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ D. Π’ языкС Perl ситуация Π°Π½Π°Π»ΠΎΠ³ΠΈΡ‡Π½Π°. ΠŸΡ€ΠΈ установкС этого ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π° конструкция Β«Π½Π°Ρ‡Π°Π»ΠΎ строки» соотвСтствуСт позициям нСпосрСдствСнно послС символов Π½ΠΎΠ²ΠΎΠΉ строки Π²ΠΎ Π²Ρ…ΠΎΠ΄Π½ΠΎΠΉ строкС ΠΈ Π½Π°Ρ‡Π°Π»Ρƒ Π²Ρ…ΠΎΠ΄Π½ΠΎΠΉ строки, Π° конструкция Β«ΠΊΠΎΠ½Π΅Ρ† строки» β€” позициям нСпосрСдствСнно послС символов ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Π° строки ΠΈ ΠΊΠΎΠ½Ρ†Ρƒ Π²Ρ…ΠΎΠ΄Π½ΠΎΠΉ строки. Π­Ρ‚ΠΎ соотвСтствуСт Perl-ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Ρƒ /m. Установка этого ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π° Π½Π΅ ΠΈΠ·ΠΌΠ΅Π½ΠΈΡ‚ ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ сопоставлСния, Ссли входная строка Π½Π΅ содСрТит символов ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Π° строки Β«\nΒ», ΠΈΠ»ΠΈ шаблон Π½Π΅ содСрТит мСтасимволов Π½Π°Ρ‡Π°Π»Π° Β«^Β» ΠΈΠ»ΠΈ ΠΊΠΎΠ½Ρ†Π° Β«$Β» строки.
s (PCRE_DOTALL)
Π‘ этим ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠΌ мСтасимвол Β«Ρ‚ΠΎΡ‡ΠΊΠ°Β» Π² шаблонС соотвСтствуСт всСм символам, Π²ΠΊΠ»ΡŽΡ‡Π°Ρ ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄ строк. Π‘Π΅Π· ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π° Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€ ΠΈΡΠΊΠ»ΡŽΡ‡ΠΈΡ‚ ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Ρ‹ строк. Π­Ρ‚ΠΎΡ‚ ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ эквивалСнтСн Perl-ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Ρƒ /s. Π‘ΠΈΠΌΠ²ΠΎΠ»ΡŒΠ½Ρ‹ΠΉ класс с ΠΎΡ‚Ρ€ΠΈΡ†Π°Π½ΠΈΠ΅ΠΌ, Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€ [^a], соотвСтствуСт символу ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Π° строки нСзависимо ΠΎΡ‚ установки этого ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π°.
x (PCRE_EXTENDED)
Анализатор, Ссли этот ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ установили, ΠΏΡ€ΠΎΠΈΠ³Π½ΠΎΡ€ΠΈΡ€ΡƒΠ΅Ρ‚ Π² шаблонС символы ΠΏΡ€ΠΎΠ±Π΅Π»ΡŒΠ½Ρ‹Ρ… Π΄Π°Π½Π½Ρ‹Ρ…, Ссли ΠΏΡ€ΠΎΠ±Π΅Π»ΡŒΠ½Ρ‹Π΅ символы Π½Π΅ заэкранировали ΠΈΠ»ΠΈ Π½Π΅ записали Π²Π½ΡƒΡ‚Ρ€ΠΈ символьного класса. ΠŸΠ°Ρ€ΡΠ΅Ρ€ Ρ‚Π°ΠΊΠΆΠ΅ ΠΏΡ€ΠΎΠΈΠ³Π½ΠΎΡ€ΠΈΡ€ΡƒΠ΅Ρ‚ символы ΠΌΠ΅ΠΆΠ΄Ρƒ нСэкранированным символом Β«#Β» Π²Π½Π΅ символьного класса ΠΈ ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΠΈΠΌ ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄ΠΎΠΌ строки, Π²ΠΊΠ»ΡŽΡ‡Π°Ρ сами символы Β«\nΒ» ΠΈ Β«#Β». Π­Ρ‚ΠΎ эквивалСнтно Perl-ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Ρƒ /x ΠΈ Ρ€Π°Π·Ρ€Π΅ΡˆΠ°Π΅Ρ‚ Ρ€Π°Π·ΠΌΠ΅Ρ‰Π°Ρ‚ΡŒ ΠΊΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ Π² слоТных ΡˆΠ°Π±Π»ΠΎΠ½Π°Ρ…. Π—Π°ΠΌΠ΅Ρ‡Π°Π½ΠΈΠ΅: это касаСтся Ρ‚ΠΎΠ»ΡŒΠΊΠΎ ΡΠΈΠΌΠ²ΠΎΠ»ΡŒΠ½Ρ‹Ρ… Π΄Π°Π½Π½Ρ‹Ρ…. ΠŸΡ€ΠΎΠ±Π΅Π»ΡŒΠ½Ρ‹Π΅ символы Π½Π΅ Ρ„ΠΈΠ³ΡƒΡ€ΠΈΡ€ΡƒΡŽΡ‚ Π² слуТСбных ΡΠΈΠΌΠ²ΠΎΠ»ΡŒΠ½Ρ‹Ρ… ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΡΡ…, Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, Π² ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ Β«(?(Β», которая ΠΎΡ‚ΠΊΡ€Ρ‹Π²Π°Π΅Ρ‚ ΡƒΡΠ»ΠΎΠ²Π½ΡƒΡŽ подмаску.
A (PCRE_ANCHORED)
Π­Ρ‚ΠΎΡ‚ ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ «заякорСваСт» шаблон, Ρ‚ΠΎ Π΅ΡΡ‚ΡŒ парсСр Π½Π°ΠΉΠ΄Ρ‘Ρ‚ совпадСниС, Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Ссли шаблон соотвСтствуСт Π½Π°Ρ‡Π°Π»Ρƒ Π²Ρ…ΠΎΠ΄Π½ΠΎΠΉ строки. Π’ΠΎΠ³ΠΎ ΠΆΠ΅ эффСкта Π΄ΠΎΠ±ΠΈΠ²Π°ΡŽΡ‚ΡΡ конструкциСй Β«^Β» Π² самом шаблонС β€” СдинствСнный ΠΏΡƒΡ‚ΡŒ ΡΠ΄Π΅Π»Π°Ρ‚ΡŒ Ρ‚ΠΎ ΠΆΠ΅ Π² Perl.
D (PCRE_DOLLAR_ENDONLY)
Π‘ этим ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠΌ мСтасимвол Β«$Β», ΠΈΠ»ΠΈ Π·Π½Π°ΠΊ Π΄ΠΎΠ»Π»Π°Ρ€Π°, Π² шаблонС Π±ΡƒΠ΄Π΅Ρ‚ ΡΠΎΠΎΡ‚Π²Π΅Ρ‚ΡΡ‚Π²ΠΎΠ²Π°Ρ‚ΡŒ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ ΠΊΠΎΠ½Ρ†Ρƒ Π²Ρ…ΠΎΠ΄Π½ΠΎΠΉ строки. Π‘Π΅Π· этого ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π° Π·Π½Π°ΠΊ Π΄ΠΎΠ»Π»Π°Ρ€Π° Ρ‚Π°ΠΊΠΆΠ΅ соотвСтствуСт ΠΏΠΎΠ·ΠΈΡ†ΠΈΠΈ ΠΏΠ΅Ρ€Π΅Π΄ послСдним символом, Ссли послСдний символ β€” ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄ строки (Π½ΠΎ Π½Π΅ распространяСтся Π½Π° Π΄Ρ€ΡƒΠ³ΠΈΠ΅ ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Ρ‹ строк). ΠŸΠ°Ρ€ΡΠ΅Ρ€ ΠΏΡ€ΠΎΠΈΠ³Π½ΠΎΡ€ΠΈΡ€ΡƒΠ΅Ρ‚ этот ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€, Ссли ΡƒΠΊΠ°Π·Π°Π»ΠΈ ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ m. Π’ языкС Perl Π°Π½Π°Π»ΠΎΠ³ΠΈΡ‡Π½ΠΎΠ³ΠΎ ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π° Π½Π΅Ρ‚.
S
Π¨Π°Π±Π»ΠΎΠ½, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ сопоставляСтся ΠΌΠ½ΠΎΠ³ΠΎΠΊΡ€Π°Ρ‚Π½ΠΎ, заслуТиваСт Ρ‚ΠΎΠ³ΠΎ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ парсСр ΠΏΠΎΡ‚Ρ€Π°Ρ‚ΠΈΠ» Π½Π° Π°Π½Π°Π»ΠΈΠ· этого шаблона большС Π²Ρ€Π΅ΠΌΠ΅Π½ΠΈ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΡƒΠ²Π΅Π»ΠΈΡ‡ΠΈΡ‚ΡŒ ΡΠΊΠΎΡ€ΠΎΡΡ‚ΡŒ сопоставлСния. ΠŸΠ°Ρ€ΡΠ΅Ρ€ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎ Π°Π½Π°Π»ΠΈΠ·ΠΈΡ€ΡƒΠ΅Ρ‚ шаблон ΠΏΡ€ΠΈ установкС этого ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π°. Π˜Π·ΡƒΡ‡Π΅Π½ΠΈΠ΅ шаблона парсСром ΠΏΠΎΠ»Π΅Π·Π½ΠΎ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ для «нСзаякорСнных» шаблонов, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π½Π΅ Π½Π°Ρ‡ΠΈΠ½Π°ΡŽΡ‚ΡΡ с фиксированного Π½Π°Ρ‡Π°Π»ΡŒΠ½ΠΎΠ³ΠΎ символа. Π‘ PHP 7.3.0 Ρ„Π»Π°Π³ Π½Π΅ Π΄Π°Ρ‘Ρ‚ эффСкта.
U (PCRE_UNGREEDY)
ΠœΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ ΠΈΠ½Π²Π΅Ρ€Ρ‚ΠΈΡ€ΡƒΠ΅Ρ‚ Β«ΠΆΠ°Π΄Π½ΠΎΡΡ‚ΡŒΒ» ΠΊΠ²Π°Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠ², Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Ρ‹ Π½Π΅ Π±Ρ‹Π»ΠΈ ΠΆΠ°Π΄Π½Ρ‹ΠΌΠΈ ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ, Π° ΡΡ‚Π°Π½ΠΎΠ²ΠΈΠ»ΠΈΡΡŒ ΠΆΠ°Π΄Π½Ρ‹ΠΌΠΈ, Ссли Π·Π° Π½ΠΈΠΌΠΈ ΠΈΠ΄Ρ‘Ρ‚ Π·Π½Π°ΠΊ ?. ΠœΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ нСсовмСстим с языком программирования Perl. НСТадный Ρ€Π΅ΠΆΠΈΠΌ Ρ‚Π°ΠΊΠΆΠ΅ ΡƒΡΡ‚Π°Π½Π°Π²Π»ΠΈΠ²Π°ΡŽΡ‚ ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠΌ (?U) Π²Π½ΡƒΡ‚Ρ€ΠΈ шаблона ΠΈΠ»ΠΈ Π΄ΠΎΠ±Π°Π²Π»ΡΡŽΡ‚ Π·Π½Π°ΠΊ вопроса послС ΠΊΠ²Π°Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π° (Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, .*?).

Π—Π°ΠΌΠ΅Ρ‡Π°Π½ΠΈΠ΅:

Π’ Π½Π΅ΠΆΠ°Π΄Π½ΠΎΠΌ Ρ€Π΅ΠΆΠΈΠΌΠ΅ ΠΎΠ±Ρ‹Ρ‡Π½ΠΎ Π½Π΅Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ совпадСниС символов, количСство ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… ΠΏΡ€Π΅Π²Ρ‹ΡˆΠ°Π΅Ρ‚ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ Π΄ΠΈΡ€Π΅ΠΊΡ‚ΠΈΠ²Ρ‹ pcre.backtrack_limit.

X (PCRE_EXTRA)
ΠœΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ Π²ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡ‚Π΅Π»ΡŒΠ½ΡƒΡŽ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΎΠ½Π°Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ модуля PCRE, которая нСсовмСстима с Perl. Π‘ этим ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠΌ ΠΎΠ±Ρ€Π°Ρ‚Π½Ρ‹ΠΉ слСш Π² шаблонС, Π·Π° ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΌ ΠΈΠ΄Ρ‘Ρ‚ Π±ΡƒΠΊΠ²Π° Π±Π΅Π· ΡΠΏΠ΅Ρ†ΠΈΠ°Π»ΡŒΠ½ΠΎΠ³ΠΎ значСния, Π²Ρ‹Π·Ρ‹Π²Π°Π΅Ρ‚ ΠΎΡˆΠΈΠ±ΠΊΡƒ. Π’Π°ΠΊΠΎΠ΅ ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ Ρ€Π΅Π·Π΅Ρ€Π²ΠΈΡ€ΡƒΠ΅Ρ‚ ΠΊΠΎΠΌΠ±ΠΈΠ½Π°Ρ†ΠΈΠΈ с ΠΎΠ±Ρ€Π°Ρ‚Π½Ρ‹ΠΌ слСшСм ΠΏΠ΅Ρ€Π΅Π΄ Π±ΡƒΠΊΠ²ΠΎΠΉ Π±Π΅Π· ΡΠΏΠ΅Ρ†ΠΈΠ°Π»ΡŒΠ½ΠΎΠ³ΠΎ значСния для Π±ΡƒΠ΄ΡƒΡ‰Π΅Π³ΠΎ Ρ€Π°ΡΡˆΠΈΡ€Π΅Π½ΠΈΡ. По ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ, ΠΊΠ°ΠΊ ΠΈ Π² языкС Perl, парсСр рассматриваСт слСш ΠΏΠ΅Ρ€Π΅Π΄ Π±ΡƒΠΊΠ²ΠΎΠΉ Π±Π΅Π· ΡΠΏΠ΅Ρ†ΠΈΠ°Π»ΡŒΠ½ΠΎΠ³ΠΎ значСния ΠΊΠ°ΠΊ Π»ΠΈΡ‚Π΅Ρ€Π°Π». Пока это всё, Ρ‡Π΅ΠΌ управляСт ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€.
J (PCRE_INFO_JCHANGED)
Установка Π²Π½ΡƒΡ‚Ρ€Π΅Π½Π½Π΅ΠΉ ΠΎΠΏΡ†ΠΈΠΈ (?J) мСняСт Π»ΠΎΠΊΠ°Π»ΡŒΠ½ΡƒΡŽ ΠΎΠΏΡ†ΠΈΡŽ PCRE_DUPNAMES. ΠœΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ Ρ€Π°Π·Ρ€Π΅ΡˆΠ°Π΅Ρ‚ подшаблонам ΠΎΠ΄ΠΈΠ½Π°ΠΊΠΎΠ²Ρ‹Π΅ ΠΈΠΌΠ΅Π½Π°. ΠŸΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€Π° J Π΄ΠΎΠ±Π°Π²ΠΈΠ»ΠΈ Π² PHP 7.2.0.
u (PCRE_UTF8)
ΠœΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ Π²ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡ‚Π΅Π»ΡŒΠ½ΡƒΡŽ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΎΠ½Π°Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ модуля PCRE, которая нСсовмСстима с Perl. Π‘ этим ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠΌ парсСр ΠΎΠ±Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°Π΅Ρ‚ шаблон ΠΈ Π²Ρ…ΠΎΠ΄Π½ΡƒΡŽ строку ΠΊΠ°ΠΊ строку Π² ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²ΠΊΠ΅ UTF-8. НСдопустимая входная строка ΠΏΡ€ΠΈΠ²ΠΎΠ΄ΠΈΡ‚ ΠΊ Ρ‚ΠΎΠΌΡƒ, Ρ‡Ρ‚ΠΎ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ preg_* Π½ΠΈΡ‡Π΅Π³ΠΎ Π½Π΅ находят, Π° Π½Π΅ΠΏΡ€Π°Π²ΠΈΠ»ΡŒΠ½Ρ‹ΠΉ шаблон ΠΏΡ€ΠΈΠ²ΠΎΠ΄ΠΈΡ‚ ΠΊ ошибкС уровня E_WARNING. ΠŸΡΡ‚Ρ‹ΠΉ ΠΈ ΡˆΠ΅ΡΡ‚ΠΎΠΉ ΠΎΠΊΡ‚Π΅Ρ‚Ρ‹ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ UTF-8 Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ΡΡ нСдопустимыми.
n (PCRE_NO_AUTO_CAPTURE)
ΠœΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ Π΄Π΅Π»Π°Π΅Ρ‚ простыС Π³Ρ€ΡƒΠΏΠΏΡ‹ (xyz) Π½Π΅Π·Π°Ρ…Π²Π°Ρ‚Ρ‹Π²Π°Π΅ΠΌΡ‹ΠΌΠΈ. ΠŸΠ΅Ρ€Π΅Ρ…Π²Π°Ρ‚Ρ‹Π²Π°ΡŽΡ‚ΡΡ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ ΠΈΠΌΠ΅Π½ΠΎΠ²Π°Π½Π½Ρ‹Π΅ Π³Ρ€ΡƒΠΏΠΏΡ‹ Π½Π°ΠΏΠΎΠ΄ΠΎΠ±ΠΈΠ΅ (?<name>xyz). Π­Ρ‚ΠΎ влияСт Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Π½Π° Ρ‚ΠΎ, ΠΊΠ°ΠΊΠΈΠ΅ Π³Ρ€ΡƒΠΏΠΏΡ‹ парсСр ΠΏΠ΅Ρ€Π΅Ρ…Π²Π°Ρ‚ΠΈΡ‚. По-ΠΏΡ€Π΅ΠΆΠ½Π΅ΠΌΡƒ Ρ€Π°Π·Ρ€Π΅ΡˆΠ΅Π½ΠΎ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒΡΡ Π½ΡƒΠΌΠ΅Ρ€ΠΎΠ²Π°Π½Π½Ρ‹ΠΌΠΈ ссылками Π½Π° ΠΏΠΎΠ΄ΡˆΠ°Π±Π»ΠΎΠ½Ρ‹, Π° массив совпадСний ΠΏΠΎ-ΠΏΡ€Π΅ΠΆΠ½Π΅ΠΌΡƒ Π±ΡƒΠ΄Π΅Ρ‚ ΡΠΎΠ΄Π΅Ρ€ΠΆΠ°Ρ‚ΡŒ Π½ΡƒΠΌΠ΅Ρ€ΠΎΠ²Π°Π½Π½Ρ‹Π΅ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹. ΠœΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ доступСн с PHP 8.2.0.
r (PCRE2_EXTRA_CASELESS_RESTRICT)
Когда u (PCRE_UTF8) ΠΈ i (PCRE_CASELESS) Π΄Π΅ΠΉΡΡ‚Π²ΡƒΡŽΡ‚, ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ ΠΏΡ€Π΅Π΄ΠΎΡ‚Π²Ρ€Π°Ρ‰Π°Π΅Ρ‚ сопоставлСниС ΠΌΠ΅ΠΆΠ΄Ρƒ ASCII-символами ΠΈ Π½Π΅-ASCII-символами. НапримСр, preg_match('/\x{212A}/iu', "K") соотвСтствуСт Π·Π½Π°ΠΊΡƒ КСльвина β„ͺ (U+212A). ΠŸΡ€ΠΈ использовании r (preg_match('/\x{212A}/iur', "K")) совпадСниС Π½Π΅ происходит. Доступно начиная с PHP 8.4.0.
οΌ‹Π”ΠΎΠ±Π°Π²ΠΈΡ‚ΡŒ

ΠŸΡ€ΠΈΠΌΠ΅Ρ‡Π°Π½ΠΈΡ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»Π΅ΠΉ 11 notes

up
28
hfuecks at nospam dot org ΒΆ
20 years ago
Regarding the validity of a UTF-8 string when using the /u pattern modifier, some things to be aware of;

1. If the pattern itself contains an invalid UTF-8 character, you get an error (as mentioned in the docs above - "UTF-8 validity of the pattern is checked since PHP 4.3.5"

2. When the subject string contains invalid UTF-8 sequences / codepoints, it basically result in a "quiet death" for the preg_* functions, where nothing is matched but without indication that the string is invalid UTF-8

3. PCRE regards five and six octet UTF-8 character sequences as valid (both in patterns and the subject string) but these are not supported in Unicode ( see section 5.9 "Character Encoding" of the "Secure Programming for Linux and Unix HOWTO" - can be found at http://www.tldp.org/ and other places )

4. For an example algorithm in PHP which tests the validity of a UTF-8 string (and discards five / six octet sequences) head to: http://hsivonen.iki.fi/php-utf8/

The following script should give you an idea of what works and what doesn't;

<?php
$examples = array(
    'Valid ASCII' => "a",
    'Valid 2 Octet Sequence' => "\xc3\xb1",
    'Invalid 2 Octet Sequence' => "\xc3\x28",
    'Invalid Sequence Identifier' => "\xa0\xa1",
    'Valid 3 Octet Sequence' => "\xe2\x82\xa1",
    'Invalid 3 Octet Sequence (in 2nd Octet)' => "\xe2\x28\xa1",
    'Invalid 3 Octet Sequence (in 3rd Octet)' => "\xe2\x82\x28",

    'Valid 4 Octet Sequence' => "\xf0\x90\x8c\xbc",
    'Invalid 4 Octet Sequence (in 2nd Octet)' => "\xf0\x28\x8c\xbc",
    'Invalid 4 Octet Sequence (in 3rd Octet)' => "\xf0\x90\x28\xbc",
    'Invalid 4 Octet Sequence (in 4th Octet)' => "\xf0\x28\x8c\x28",
    'Valid 5 Octet Sequence (but not Unicode!)' => "\xf8\xa1\xa1\xa1\xa1",
    'Valid 6 Octet Sequence (but not Unicode!)' => "\xfc\xa1\xa1\xa1\xa1\xa1",
);

echo "++Invalid UTF-8 in pattern\n";
foreach ( $examples as $name => $str ) {
    echo "$name\n";
    preg_match("/".$str."/u",'Testing');
}

echo "++ preg_match() examples\n";
foreach ( $examples as $name => $str ) {
    
    preg_match("/\xf8\xa1\xa1\xa1\xa1/u", $str, $ar);
    echo "$name: ";

    if ( count($ar) == 0 ) {
        echo "Matched nothing!\n";
    } else {
        echo "Matched {$ar[0]}\n";
    }
    
}

echo "++ preg_match_all() examples\n";
foreach ( $examples as $name => $str ) {
    preg_match_all('/./u', $str, $ar);
    echo "$name: ";
    
    $num_utf8_chars = count($ar[0]);
    if ( $num_utf8_chars == 0 ) {
        echo "Matched nothing!\n";
    } else {
        echo "Matched $num_utf8_chars character\n";
    }
    
}
?>
up
14
varrah NO_GARBAGE_OR_SPAM AT mail DOT ru ΒΆ
20 years ago
Spent a few days, trying to understand how to create a pattern for Unicode chars, using the hex codes. Finally made it, after reading several manuals, that weren't giving any practical PHP-valid examples. So here's one of them:

For example we would like to search for Japanese-standard circled numbers 1-9 (Unicode codes are 0x2460-0x2468) in order to make it through the hex-codes the following call should be used:
preg_match('/[\x{2460}-\x{2468}]/u', $str);

Here $str is a haystack string
\x{hex} - is an UTF-8 hex char-code
and /u is used for identifying the class as a class of Unicode chars.

Hope, it'll be useful.
up
12
phpman at crustynet dot org dot uk ΒΆ
15 years ago
The description of the "u" flag is a bit misleading. It suggests that it is only required if the pattern contains UTF-8 characters, when in fact it is required if either the pattern or the subject contain UTF-8. Without it, I was having problems with preg_match_all returning invalid multibyte characters when given a UTF-8 subject string.

It's fairly clear if you read the documentation for libpcre:

       In  order  process  UTF-8 strings, you must build PCRE to include UTF-8
       support in the code, and, in addition,  you  must  call  pcre_compile()
       with  the  PCRE_UTF8  option  flag,  or the pattern must start with the
       sequence (*UTF8). When either of these is the case,  both  the  pattern
       and  any  subject  strings  that  are matched against it are treated as
       UTF-8 strings instead of strings of 1-byte characters.

[from http://www.pcre.org/pcre.txt]
up
10
Daniel Klein ΒΆ
14 years ago
If the _subject_ contains utf-8 sequences the 'u' modifier should be set, otherwise a pattern such as /./ could match a utf-8 sequence as two to four individual ASCII characters. It is not a requirement, however, as you may have a need to break apart utf-8 sequences into single bytes. Most of the time, though, if you're working with utf-8 strings you should use the 'u' modifier.

If the subject doesn't contain any utf-8 sequences (i.e. characters in the range 0x00-0x7F only) but the pattern does, as far as I can work out, setting the 'u' modifier would have no effect on the result.
up
6
arash dot dalir at gmail dot com ΒΆ
8 years ago
the PCRE_INFO_JCHANGED modifier is apparently not accepted as a global option (after the closing delimiter) in PHP versions <= 5.4 (not checked in PHP 5.5) but allowed in PHP 5.6 (also not checked in PHP 7.X)

The following pattern doesn't work in PHP 5.4, but it works in PHP 5.6:

<?php
//test.php
preg_match_all('/(?<dup_name>\d{1,4})\-(?<dup_name>\d{1,2})/J', '1234-23', $matches);
var_dump($matches);

/*
output in PHP 5.4:
Warning: preg_match_all(): Unknown modifier 'J' in test.php on line 3
NULL
--------------
output PHP 5.6:
array(4) { 
    [0]=> array(1)  { [0]=> string(7) "1234-23" } 
    ["dup_name"]=> array(1) { [0]=> string(2) "23" } 
    [1]=> array(1) { [0]=> string(4) "1234" } 
    [2]=> array(1) { [0]=> string(2) "23" } 
}
*/
?>

in order to resolve this issue in PHP 5.4, one can use the (?J) pattern modifier, which indicates the pattern (from that point forward) allows duplicate names for subpatterns.

code which works in PHP 5.4:
<?php

preg_match_all('/(?J)(?<dup_name>\d{1,4})\-(?<dup_name>\d{1,2})/', '1234-23', $matches);
var_dump($matches);

/*
output in PHP 5.4:
array(4) { 
    [0]=> array(1) { [0]=> string(7) "1234-23" } 
    ["dup_name"]=> array(1) { [0]=> string(2) "23" } 
    [1]=> array(1) { [0]=> string(4) "1234" } 
    [2]=> array(1) { [0]=> string(2) "23" } 
}
--------------
output in PHP 5.6 (the same as with /J):
array(4) { 
    [0]=> array(1)  { [0]=> string(7) "1234-23" } 
    ["dup_name"]=> array(1) { [0]=> string(2) "23" } 
    [1]=> array(1) { [0]=> string(4) "1234" } 
    [2]=> array(1) { [0]=> string(2) "23" } 
}
*/
?>
up
3
Hayley Watson ΒΆ
5 years ago
Starting from 7.3.0, the 'S' modifier has no effect; this analysis is now always done by the PCRE engine.
up
2
Anonymous ΒΆ
6 years ago
A warning about the /i modifier and POSIX character classes:
If you're using POSIX character classes in your regex that indicate case such as [:upper:] or [:lower:] in combination with the /i modifier, then in PHP < 7.3 the /i modifier will take precedence and effectively make both those character classes work as [:alpha:], but in PHP >= 7.3 the character classes overrule the /i modifier.
up
1
Wirek ΒΆ
8 years ago
An important addendum (with new $pat3_2 utilising \R properly, its results and comments):
Note that there are (sometimes difficult to grasp at first glance) nuances of meaning and application of escape sequences like \r, \R and \v - none of them is perfect in all situations, but they are quite useful nevertheless. Some official PCRE control options and their changes come in handy too - unfortunately neither (*ANYCRLF), (*ANY) nor (*CRLF) is documented here on php.net at the moment (although they seem to be available for over 10 years and 5 months now), but they are described on Wikipedia ("Newline/linebreak options" at https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions) and official PCRE library site ("Newline convention" at http://www.pcre.org/original/doc/html/pcresyntax.html#SEC17) pretty well. The functionality of \R appears somehow disappointing (with default configuration of compile time option) according to php.net as well as official description ("Newline sequences" at https://www.pcre.org/original/doc/html/pcrepattern.html#newlineseq) when used improperly.

A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end (or at the beginning) of any line even without the multiple lines mode (/m) or meta-character assertions ($ or ^).
<?php 
// Various OS-es have various end line (a.k.a line break) chars:
// - Windows uses CR+LF (\r\n);
// - Linux LF (\n);
// - OSX CR (\r).
// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?) of default configuration option for meta-character assertions (^ and $) at compile time of PCRE.
$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";
//          C          3                   p          0                   _
$pat3='/\w\R?$/mi';    // Somehow disappointing according to php.net and pcre.org when used improperly
$pat3_2='/\w(?=\R)/i';    // Much better with allowed lookahead assertion (just to detect without capture) without multiline (/m) mode; note that with alternative for end of string ((?=\R|$)) it would grab all 7 elements as expected, but '/(*ANYCRLF)\w$/mi' is more straightforward in use anyway
$p=preg_match_all($pat3, $str, $m3);
$r=preg_match_all($pat3_2, $str, $m4);
echo $str."\n3 !!! $pat3 ($p): ".print_r($m3[0], true)
    ."\n3_2 !!! $pat3_2 ($r): ".print_r($m4[0], true);
// Note the difference between the two very helpful escape sequences in $pat3 and $pat3_2 (\R) - for some applications at least.

/* The code above results in the following output:
ABC ABC

123 123
def def
nop nop
890 890
QRS QRS

~-_ ~-_
3 !!! /\w\R?$/mi (5): Array
(
    [0] => C

    [1] => 3
    [2] => p
    [3] => 0
    [4] => _
)

3_2 !!! /\w(?=\R)/i (6): Array
(
    [0] => C
    [1] => 3
    [2] => f
    [3] => p
    [4] => 0
    [5] => S
)
 */
?>
Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.
up
2
michal dot kocarek at brainbox dot cz ΒΆ
17 years ago
In case you're wondering, what is the meaning of "S" modifier, this paragraph might be useful:

When "S" modifier is set, PHP calls the pcre_study() function from the PCRE API before executing the regexp. Result from the function is passed directly to pcre_exec().

For more information about pcre_study() and "Studying the pattern" check the PCRE manual on http://www.pcre.org/pcre.txt

PS: Note that function names "pcre_study" and "pcre_exec" used here refer to PCRE library functions written in C language and not to any PHP functions.
up
1
Wirek ΒΆ
8 years ago
A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end ($) of any line in multiple lines mode (/m).
<?php 
// Various OS-es have various end line (a.k.a line break) chars:
// - Windows uses CR+LF (\r\n);
// - Linux LF (\n);
// - OSX CR (\r).
// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?).
$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";
//          C          3                   p          0                   _
$pat1='/\w$/mi';    // This works excellent in JavaScript (Firefox 7.0.1+)
$pat2='/\w\r?$/mi';
$pat3='/\w\R?$/mi';    // Somehow disappointing according to php.net and pcre.org
$pat4='/\w\v?$/mi';
$pat5='/(*ANYCRLF)\w$/mi';    // Excellent but undocumented on php.net at the moment
$n=preg_match_all($pat1, $str, $m1);
$o=preg_match_all($pat2, $str, $m2);
$p=preg_match_all($pat3, $str, $m3);
$r=preg_match_all($pat4, $str, $m4);
$s=preg_match_all($pat5, $str, $m5);
echo $str."\n1 !!! $pat1 ($n): ".print_r($m1[0], true)
    ."\n2 !!! $pat2 ($o): ".print_r($m2[0], true)
    ."\n3 !!! $pat3 ($p): ".print_r($m3[0], true)
    ."\n4 !!! $pat4 ($r): ".print_r($m4[0], true)
    ."\n5 !!! $pat5 ($s): ".print_r($m5[0], true);
// Note the difference among the three very helpful escape sequences in $pat2 (\r), $pat3 (\R), $pat4 (\v) and altered newline option in $pat5 ((*ANYCRLF)) - for some applications at least.

/* The code above results in the following output:
ABC ABC

123 123
def def
nop nop
890 890
QRS QRS

~-_ ~-_
1 !!! /\w$/mi (3): Array
(
    [0] => C
    [1] => 0
    [2] => _
)

2 !!! /\w\r?$/mi (5): Array
(
    [0] => C
    [1] => 3
    [2] => p
    [3] => 0
    [4] => _
)

3 !!! /\w\R?$/mi (5): Array
(
    [0] => C

    [1] => 3
    [2] => p
    [3] => 0
    [4] => _
) 

4 !!! /\w\v?$/mi (5): Array
(
    [0] => C

    [1] => 3
    [2] => p
    [3] => 0
    [4] => _
)

5 !!! /(*ANYCRLF)\w$/mi (7): Array
(
    [0] => C
    [1] => 3
    [2] => f
    [3] => p
    [4] => 0
    [5] => S
    [6] => _
)
 */
?>
Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.
up
1
ebarnard at marathonmultimedia dot com ΒΆ
19 years ago
When adding comments with the /x modifier, don't use the pattern delimiter in the comments. It may not be ignored in the comments area. Example:

<?php
$target = 'some text';
if(preg_match('/
                e # Comments here
               /x',$target)) {
    print "Target 1 hit.\n";
}
if(preg_match('/
                e # /Comments here with slash
               /x',$target)) {
    print "Target 1 hit.\n";
}
?>

prints "Target 1 hit." but then generates a PHP warning message for the second preg_match():

Warning:  preg_match() [function.preg-match]: Unknown modifier 'C' in /ebarnard/x-modifier.php on line 11