Introduction
Sitemap is an important part of your site, and large sites with more than 50,000 pages require a multi-page sitemaps. It might become a challenge to create such a sitemap, especially when trying to run it on the godaddy hosting, which is quite limiting.
Background
This article is intended for all types of audience and whoever could find it useful. SQL knowledge is required in order to create two SQL queries.
Using the code
The code is quite simple. The main idea is to calculate total amount of sitemap pages, and to create one page at the time using database offset (limiting amount of rows for each SQL request.) Two files are included: the generateSitemap.php
and dbc.php
. The generateSitemap is the logic, and the dbc is a database work file. Two queries are needed: first is to get total amount of products designated for the sitemap, and the second is to generate sitemap links for each product. Both queries are hardcoded to the dbc.php
file
dbc.php:
<?php
class dbc {
public $dbserver = 'SERVER';
public $dbusername = 'USERNAME';
public $dbpassword = 'PASSWORD';
public $dbname = 'DATABASE NAME';
function openDb() {
try {
$db = new PDO('mysql:host=' . $this->dbserver . ';dbname=' .
$this->dbname . ';charset=utf8', '' . $this->dbusername . '', '' . $this->dbpassword . '');
} catch (PDOException $e) {
die("error, please try again");
}
return $db;
}
function getTotalProductsInDatabase($recordsPerSiteMapFile) {
$query = "SELECT count(*) as cnt FROM products";
$dba = $this->openDb();
$stmt = $dba->prepare($query);
$stmt->execute();
$row = $stmt->fetch();
$dba = null;
unset($dba);
unset($stmt);
return (((int) ($row['cnt'] / $recordsPerSiteMapFile)) + 1);
}
function getProductsForSitemapFileNumber($recordsPerSiteMapFile, $offset) {
$query = "(select product_links as description from products limit ? OFFSET ?)";
$dba = $this->openDb();
$stmt = $dba->prepare($query);
$stmt->bindValue(1, $recordsPerSiteMapFile, PDO::PARAM_INT);
$stmt->bindValue(2, $offset * $recordsPerSiteMapFile, PDO::PARAM_INT);
$stmt->execute();
$rows = $stmt->fetchAll(PDO::FETCH_ASSOC);
$dba = null;
unset($dba);
unset($stmt);
return $rows;
}
}
?>
The code below is the core logic. Brief explanation: get current page, if no page specified, just start with the 0. Get n-amount of links for current page from the database(using offset). Loop over all links and add them to the sitemap. Save sitemap to the xml file using gzip compression. Redirect to next sitemap page or complete the page generation by creating the sitemap index file.
generateSitemap.php:
<?php
ini_set('display_errors', true);
error_reporting(E_ALL);
ini_set('memory_limit', '-1');
set_time_limit(0);
require 'dbc.php';
$db = new dbc();
$recordsPerSiteMapFile = 30000;
$SERVER_NAME = "http://www.sitename.com/";
$rootPath = "/home/content/sitename.com/sitemaps/";
$currentPage = getanyValue('page');
$amountOfPages = ($db->getTotalProductsInDatabase($recordsPerSiteMapFile));
header("Content-type: text/html; charset=utf-8");
$xmlDoc = new DOMDocument();
$root = $xmlDoc->appendChild(
$xmlDoc->createElement("urlset"));
$tutTag = $root->appendChild(
$xmlDoc->createAttribute("xmlns"))->appendChild(
$xmlDoc->createTextNode("http://www.google.com/schemas/sitemap/0.9"));
$currentSitemapPageRows = ($db->getProductsForSitemapFileNumber($recordsPerSiteMapFile, $currentPage));
foreach ($currentSitemapPageRows as $key => $row) {
$final_url = $SERVER_NAME . fixSymbols(getUrlFriendlyString($row{'description'}));
$tutTag = $root->appendChild(
$xmlDoc->createElement("url"));
$tutTag->appendChild(
$xmlDoc->createElement("loc", htmlentities($final_url)));
$tutTag->appendChild(
$xmlDoc->createElement("priority", "0.5"));
}
$fname = "sitemap_" . $currentPage . ".xml.gz";
$xmlDoc->formatOutput = true;
$theOutput = gzencode($xmlDoc->saveXML(), 9);
file_put_contents($rootPath . $fname, $theOutput);
unset($xmlDoc);
unset($currentSitemapPageRows);
unset($theOutput);
unset($tutTag);
if ($amountOfPages == $currentPage) {
createSiteMapIndexFile($amountOfPages, $SERVER_NAME, $rootPath);
} else {
?>
<script type="text/javascript">
<!--
window.location = "generateSitemap.php?page=<?php echo ($currentPage + 1); ?>"
</script>
<?php
}
function createSiteMapIndexFile($totalPages, $SERVER_NAME, $rootPath) {
$xmlDocIndex = new DOMDocument();
$rootIndex = $xmlDocIndex->appendChild(
$xmlDocIndex->createElement("sitemapindex"));
$tutTag2 = $rootIndex->appendChild(
$xmlDocIndex->createAttribute("xmlns"))->appendChild(
$xmlDocIndex->createTextNode("http://www.google.com/schemas/sitemap/0.84"));
for ($i = 0; $i <= $totalPages; $i++) {
$fname = "sitemap_" . $i . ".xml.gz";
$tutTag2 = $rootIndex->appendChild(
$xmlDocIndex->createElement("sitemap"));
$tutTag2->appendChild(
$xmlDocIndex->createElement("loc", $SERVER_NAME . "sitemaps/" . $fname));
$tutTag2->appendChild(
$xmlDocIndex->createElement("lastmod", date('Y-m-d')));
}
$xmlDocIndex->formatOutput = true;
$xmlDocIndex->save($rootPath . "sitemapindex.xml");
echo "<a href=sitemaps/sitemapindex.xml>View Sitemap Index</a>";
}
function getUrlFriendlyString($str) {
$str = preg_replace("/[-]+/", "-", preg_replace("/[^a-z0-9-]/",
"", strtolower(str_replace(" ", "-", $str))));
return $str;
}
function getanyValue($param) {
if (isset($_GET[$param])) {
return $_GET[$param];
} else {
return 0;
}
}
function fixSymbols($str) {
return $str;
}
?>
Points of Interest
Script is useful for memory and resource-limited environments.
Each page is generated independently, so the time-consuming script runs in batches.
History
11/28/2012 - First release.