You can also use this function to repair xml, for example if stray ampersands etc are breaking it:
<?php
$xml = tidy_repair_string($xml, array(
'output-xml' => true,
'input-xml' => true
));
?>(PHP 5, PHP 7, PHP 8, PECL tidy >= 0.7.0)
tidy::repairString -- tidy_repair_string β ΠΠΎΡΡΡΠ°Π½Π°Π²Π»ΠΈΠ²Π°Π΅Ρ ΡΡΡΠΎΠΊΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ ΠΏΠΎ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡΠΈ ΠΊΠΎΠ½ΡΠΈΠ³ΡΡΠ°ΡΠΈΠΎΠ½Π½ΡΠΉ ΡΠ°ΠΉΠ»
ΠΠ±ΡΠ΅ΠΊΡΠ½ΠΎ-ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠΉ ΡΡΠΈΠ»Ρ
$string, array|string|null $config = null, ?string $encoding = null): string|falseΠΡΠΎΡΠ΅Π΄ΡΡΠ½ΡΠΉ ΡΡΠΈΠ»Ρ
$string, array|string|null $config = null, ?string $encoding = null): string|falseΠΠΎΡΡΡΠ°Π½Π°Π²Π»ΠΈΠ²Π°Π΅Ρ ΠΏΠΎΠ»ΡΡΠ΅Π½Π½ΡΡ ΡΡΡΠΎΠΊΡ.
stringΠΠ°Π½Π½ΡΠ΅ Π΄Π»Ρ Π²ΠΎΡΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½ΠΈΡ.
config
ΠΠ°ΡΡΡΠΎΠΉΠΊΠΈ config ΠΌΠΎΠ³ΡΡ Π±ΡΡΡ Π·Π°Π΄Π°Π½Ρ Π² Π²ΠΈΠ΄Π΅
ΠΌΠ°ΡΡΠΈΠ²Π° ΠΈΠ»ΠΈ ΡΡΡΠΎΠΊΠΈ. ΠΡΠ»ΠΈ Π·Π°Π΄Π°Π½Π° ΡΡΡΠΎΠΊΠ°, ΡΠΎ ΠΎΠ½Π° ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠΈΡΡΠ΅ΡΡΡ ΠΊΠ°ΠΊ
ΠΈΠΌΡ ΡΠ°ΠΉΠ»Π° ΠΊΠΎΠ½ΡΠΈΠ³ΡΡΠ°ΡΠΈΠΈ, Π² ΠΏΡΠΎΡΠΈΠ²Π½ΠΎΠΌ ΡΠ»ΡΡΠ°Π΅, ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡ
ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠΈΡΡΠ΅ΡΡΡ ΠΊΠ°ΠΊ ΡΠ°ΠΌΠΈ Π½Π°ΡΡΡΠΎΠΉΠΊΠΈ.
ΠΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡ ΠΎ ΠΊΠ°ΠΆΠ΄ΠΎΠΌ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ΅ ΠΌΠΎΠΆΠ½ΠΎ Π½Π°ΠΉΡΠΈ ΡΡΡ: » http://api.html-tidy.org/#quick-reference.
encoding
ΠΠ°ΡΠ°ΠΌΠ΅ΡΡ encoding ΡΡΡΠ°Π½Π°Π²Π»ΠΈΠ²Π°Π΅Ρ ΠΊΠΎΠ΄ΠΈΡΠΎΠ²ΠΊΡ Π΄Π»Ρ
Π²Ρ
ΠΎΠ΄Π½ΡΡ
/Π²ΡΡ
ΠΎΠ΄Π½ΡΡ
Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ². ΠΠΎΠ·ΠΌΠΎΠΆΠ½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ:
ascii, latin0, latin1,
raw, utf8, iso2022,
mac, win1252, ibm858,
utf16, utf16le, utf16be,
big5, ΠΈ shiftjis.
ΠΠΎΠ·Π²ΡΠ°ΡΠ°Π΅Ρ Π²ΠΎΡΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½Π½ΡΡ ΡΡΡΠΎΠΊΡ ΠΈΠ»ΠΈ false, Π΅ΡΠ»ΠΈ Π²ΠΎΠ·Π½ΠΈΠΊΠ»Π° ΠΎΡΠΈΠ±ΠΊΠ°.
| ΠΠ΅ΡΡΠΈΡ | ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ |
|---|---|
| 8.0.0 | tidy::repairString() ΡΠ΅ΠΏΠ΅ΡΡ ΡΡΠ°ΡΠΈΡΠ½ΡΠΉ ΠΌΠ΅ΡΠΎΠ΄. |
| 8.0.0 |
config ΠΈ encoding ΡΠ΅ΠΏΠ΅ΡΡ Π΄ΠΎΠΏΡΡΠΊΠ°ΡΡ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ null.
|
| 8.0.0 |
Π€ΡΠ½ΠΊΡΠΈΡ Π±ΠΎΠ»ΡΡΠ΅ Π½Π΅ ΠΏΡΠΈΠ½ΠΈΠΌΠ°Π΅Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡ useIncludePath.
|
ΠΡΠΈΠΌΠ΅Ρ #1 ΠΡΠΈΠΌΠ΅Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ tidy::repairString()
<?php
ob_start();
?>
<html>
<head>
<title>ΡΠ΅ΡΡ</title>
</head>
<body>
<p>ΠΎΡΠΈΠ±ΠΊΠ°</i>
</body>
</html>
<?php
$buffer = ob_get_clean();
$tidy = new tidy();
$clean = $tidy->repairString($buffer);
echo $clean;
?>Π Π΅Π·ΡΠ»ΡΡΠ°Ρ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΠΏΡΠΈΠ²Π΅Π΄ΡΠ½Π½ΠΎΠ³ΠΎ ΠΏΡΠΈΠΌΠ΅ΡΠ°:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>ΡΠ΅ΡΡ</title> </head> <body> <p>ΠΎΡΠΈΠ±ΠΊΠ°</p> </body> </html>
You can also use this function to repair xml, for example if stray ampersands etc are breaking it:
<?php
$xml = tidy_repair_string($xml, array(
'output-xml' => true,
'input-xml' => true
));
?>The docs referenced at http://tidy.sourceforge.net/docs/quickref.html above state that the configuration option 'sort-attributes' is an enumeration of 'none' and 'alpha', thereby specifying that strings of either form are the acceptable values. This may not be the case, however - on my system, the option was not honored until I set it to true. This may also be the case with other options, so experiment a bit. The output of tidy::getConfig() may be useful in this regard.Using tidy is very simple to fix a broken ods/odt document
I wrote the following code to be run from command line
<?php
$zip = new ZipArchive();
if ($zip->open($argv[1])) {
$fp = $zip->getStream('content.xml'); //file inside archive
if(!$fp)
die("Error: can't get stream to document file");
$stat = $zip->statName('content.xml');
$buf = ""; //file buffer
ob_start(); //to capture CRC error message
while (!feof($fp)) {
$buf .= fread($fp, 2048);
}
$s = ob_get_contents();
ob_end_clean();
fclose($fp);
$zip->close();
$config = array(
'indent' => true,
'clean' => true,
'input-xml' => true,
'output-xml' => true,
'wrap' => false
);
$tidy = new Tidy();
$xml = $tidy->repairstring($buf, $config);
$array=split("\n",$xml);
$file=tempnam("/tmp","xml");
$fp=fopen($file,"rw+");
foreach ($array as $key=>$value) {
fwrite($fp,trim($value),strlen(trim($value)));
if ($key==0) {
fwrite($fp,"\n");
}
}
fclose($fp);
if ($zip->open($argv[1]) === TRUE) {
$zip->deleteName('content.xml');
$zip->addFile($file, 'content.xml');
$zip->close();
echo 'recovery complete';
} else {
echo 'recovery failed';
}
unlink($file);
}
?>
save it to a file called fixdoc and invoke as:
php fixdoc yourbrokendoc
for your safety, please work on a copy of your doc.