Mastering Publii Sitemaps: Absolute SEO with Relative URLs

Site Migration, SEO and sitemaps
Company: Personal Project
Publii + Relative URLs = Broken Sitemaps. Let's fix that. Using a native Python script and Systemd path monitoring on Debian, I’ve automated the generation of a professional, image-rich sitemap that bridges the gap between internal networking needs and external SEO requirements.
When building a static site with Publii, choosing "Relative URLs" is often a necessity for internal networking or split-DNS environments. However, this breaks the sitemap.xml, which requires absolute URLs for search engines to function.
This guide provides a professional, secure, and fully automated solution using Python and Systemd on a Debian/lighttpd stack.
🛠 The Solution: A Native Python Post-Processor
We bypass the internal Publii generator in favor of a custom Python script. This script "crawls" your local directory, extracts every image path, and prepends your public domain to create a Google-compliant sitemap.
1. Prerequisites
Install the only required dependency for secure HTML parsing:
sudo apt update && sudo apt install python3-bs4
2. The Python Generator (generate_sitemap.py)
Save this script in /usr/local/bin/generate_sitemap.py. Be sure to update the PUBLIC_URLand SITE_DIR variables.
Note: Change https://yourdomain.com to your real domain before hitting enter.
Python
import os
import pwd
import grp
from bs4 import BeautifulSoup
# --- CONFIGURATION ---
SITE_DIR = "/var/www/html"
PUBLIC_URL = "https://yourdomain.com"
OUTPUT_FILE = os.path.join(SITE_DIR, "sitemap.xml")
EXCLUDE_FOLDERS = {'assets', 'cgi-bin', 'tmp', '404', 'tags', 'authors'}
# ---------------------
def generate():
items = []
for root, dirs, files in os.walk(SITE_DIR):
# Skip excluded folders
dirs[:] = [d for d in dirs if d not in EXCLUDE_FOLDERS]
for file in files:
if file.endswith(".html"):
full_path = os.path.join(root, file)
rel_path = os.path.relpath(full_path, SITE_DIR).replace("\\", "/")
# Clean URL formatting
clean_path = rel_path.replace("index.html", "")
page_url = f"{PUBLIC_URL}/{clean_path}".rstrip("/")
if not clean_path: page_url = f"{PUBLIC_URL}/"
images = []
try:
with open(full_path, 'r', encoding='utf-8') as f:
soup = BeautifulSoup(f, 'html.parser')
for img in soup.find_all('img'):
src = img.get('src')
if src:
img_url = src if src.startswith('http') else f"{PUBLIC_URL}{src if src.startswith('/') else '/' + src}"
images.append(img_url)
except Exception as e:
print(f"Error reading {full_path}: {e}")
items.append({'loc': page_url, 'images': list(set(images))})
# This block must be aligned with the "for root..." loop
xml = [
'<?xml version="1.0" encoding="UTF-8"?>',
'<?xml-stylesheet type="text/xsl" href="/sitemap.xsl"?>',
'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">'
]
for item in items:
xml.append(' <url>')
xml.append(f' <loc>{item["loc"]}</loc>')
for img in item['images']:
xml.append(' <image:image>')
xml.append(f' <image:loc>{img}</image:loc>')
xml.append(' </image:image>')
xml.append(' </url>')
xml.append('</urlset>')
with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
f.write('\n'.join(xml))
# Force ownership of the generated file to www-data
uid = pwd.getpwnam("www-data").pw_uid
gid = grp.getgrnam("www-data").gr_gid
os.chown(OUTPUT_FILE, uid, gid)
os.chmod(OUTPUT_FILE, 0o644)
if __name__ == "__main__":
generate()
print(f"Sitemap successfully generated at {OUTPUT_FILE}")3. Professional Styling (sitemap.xsl)
To prevent the "No style information" browser error, save this file to /var/www/html/sitemap.xsl. This makes the XML readable for humans while remaining valid for crawlers.
The Stylesheet (sitemap.xsl)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:html="http://www.w3.org/TR/REC-html40"
xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>XML Sitemap - Professional Index</title>
<style type="text/css">
body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen-Sans, Ubuntu, Cantarell, "Helvetica Neue", sans-serif; color: #333; margin: 0; padding: 40px; background: #f9f9fb; }
.container { max-width: 1000px; margin: 0 auto; background: #fff; padding: 30px; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.05); }
h1 { color: #1a1a1a; font-size: 24px; margin-bottom: 10px; border-bottom: 2px solid #3498db; display: inline-block; padding-bottom: 5px; }
p { color: #666; margin-bottom: 30px; }
table { border-collapse: collapse; width: 100%; margin-top: 10px; }
th { background: #f8f9fa; text-align: left; padding: 12px 15px; border-bottom: 2px solid #edf2f7; font-weight: 600; color: #4a5568; }
td { padding: 12px 15px; border-bottom: 1px solid #edf2f7; word-break: break-all; }
tr:hover { background: #f7fafc; }
a { color: #3498db; text-decoration: none; }
a:hover { text-decoration: underline; }
.count-badge { background: #ebf8ff; color: #2b6cb0; padding: 2px 8px; border-radius: 12px; font-size: 12px; font-weight: 600; }
</style>
</head>
<body>
<div class="container">
<h1>XML Sitemap</h1>
<p>Generated for Google/Bing SEO. Total URLs indexed: <strong><xsl:value-of select="count(sitemap:urlset/sitemap:url)"/></strong></p>
<table>
<thead>
<tr>
<th width="75%">URL Location</th>
<th width="25%">Images Detected</th>
</tr>
</thead>
<tbody>
<xsl:for-each select="sitemap:urlset/sitemap:url">
<tr>
<td>
<a href="{sitemap:loc}"><xsl:value-of select="sitemap:loc"/></a>
</td>
<td>
<span class="count-badge">
<xsl:value-of select="count(image:image)"/> Images
</span>
</td>
</tr>
</xsl:for-each>
</tbody>
</table>
</div>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
🤖 Automating with Systemd Path Units
Instead of manual execution, we use Debian's systemd to monitor the folder. The moment you sync from Publii, the sitemap regenerates.
Step 1: Create the Service (/etc/systemd/system/sitemap-gen.service)
[Unit]
Description=Generate Sitemap after Publii Sync
[Unit]
Description=Fix Permissions and Generate Sitemap
After=network.target
[Service]
Type=oneshot
User=root
Group=root
# Fix ownership and permissions before running the script
ExecStartPre=/bin/chown -R root:root /var/www/html
ExecStartPre=/usr/bin/find /var/www/html -type d -exec chmod 755 {} +
ExecStartPre=/usr/bin/find /var/www/html -type f -exec chmod 644 {} +
# Run the generator
ExecStart=/usr/bin/python3 /usr/local/bin/generate_sitemap.py
[Install]
WantedBy=multi-user.targetwww-data
Step 2: Create the Path Watcher (/etc/systemd/system/sitemap-gen.path)
[Unit]
Description=Watch /var/www/html for changes
[Path]
PathChanged=/var/www/html/
DelaySec=15
[Install]
WantedBy=multi-user.target
Step 3: Activate
sudo systemctl daemon-reload
sudo systemctl enable --now sitemap-gen.path
✅ Summary Checklist
[ ] Dependency:
python3-bs4installed.[ ] Script:
generate_sitemap.pyconfigured with your domain.[ ] Style:
sitemap.xslplaced in web root.[ ] Automation:
systemdpath units enabled.[ ] SEO: Publii's internal sitemap generator disabled.
Now, your site remains agile with relative URLs internally, while providing a rock-solid, image-rich sitemap to the world automatically.
