Mastering Publii Sitemaps: Absolute SEO with Relative URLs

Site Migration, SEO and sitemaps

Company: Personal Project

Publii + Relative URLs = Broken Sitemaps. Let's fix that. Using a native Python script and Systemd path monitoring on Debian, I’ve automated the generation of a professional, image-rich sitemap that bridges the gap between internal networking needs and external SEO requirements.

When building a static site with Publii, choosing "Relative URLs" is often a necessity for internal networking or split-DNS environments. However, this breaks the sitemap.xml, which requires absolute URLs for search engines to function.

This guide provides a professional, secure, and fully automated solution using Python and Systemd on a Debian/lighttpd stack.

🛠 The Solution: A Native Python Post-Processor

We bypass the internal Publii generator in favor of a custom Python script. This script "crawls" your local directory, extracts every image path, and prepends your public domain to create a Google-compliant sitemap.

1. Prerequisites

Install the only required dependency for secure HTML parsing:

Bash
sudo apt update && sudo apt install python3-bs4

2. The Python Generator (`generate_sitemap.py`)

Save this script in /usr/local/bin/generate_sitemap.py. Be sure to update the PUBLIC_URLand SITE_DIR variables.

Note: Change https://yourdomain.com to your real domain before hitting enter.

Python

import os
import pwd
import grp
from bs4 import BeautifulSoup

# --- CONFIGURATION ---
SITE_DIR = "/var/www/html"          
PUBLIC_URL = "https://yourdomain.com" 
OUTPUT_FILE = os.path.join(SITE_DIR, "sitemap.xml")
EXCLUDE_FOLDERS = {'assets', 'cgi-bin', 'tmp', '404', 'tags', 'authors'} 
# ---------------------

def generate():
    items = []
    
    for root, dirs, files in os.walk(SITE_DIR):
        # Skip excluded folders
        dirs[:] = [d for d in dirs if d not in EXCLUDE_FOLDERS]
        
        for file in files:
            if file.endswith(".html"):
                full_path = os.path.join(root, file)
                rel_path = os.path.relpath(full_path, SITE_DIR).replace("\\", "/")
                
                # Clean URL formatting
                clean_path = rel_path.replace("index.html", "")
                page_url = f"{PUBLIC_URL}/{clean_path}".rstrip("/")
                if not clean_path: page_url = f"{PUBLIC_URL}/"

                images = []
                try:
                    with open(full_path, 'r', encoding='utf-8') as f:
                        soup = BeautifulSoup(f, 'html.parser')
                        for img in soup.find_all('img'):
                            src = img.get('src')
                            if src:
                                img_url = src if src.startswith('http') else f"{PUBLIC_URL}{src if src.startswith('/') else '/' + src}"
                                images.append(img_url)
                except Exception as e:
                    print(f"Error reading {full_path}: {e}")
                
                items.append({'loc': page_url, 'images': list(set(images))})

    # This block must be aligned with the "for root..." loop
    xml = [
        '<?xml version="1.0" encoding="UTF-8"?>',
        '<?xml-stylesheet type="text/xsl" href="/sitemap.xsl"?>',
        '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">'
    ]

    for item in items:
        xml.append('  <url>')
        xml.append(f'    <loc>{item["loc"]}</loc>')
        for img in item['images']:
            xml.append('    <image:image>')
            xml.append(f'      <image:loc>{img}</image:loc>')
            xml.append('    </image:image>')
        xml.append('  </url>')

    xml.append('</urlset>')

    with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
        f.write('\n'.join(xml)) 

    # Force ownership of the generated file to www-data
    uid = pwd.getpwnam("www-data").pw_uid
    gid = grp.getgrnam("www-data").gr_gid
    os.chown(OUTPUT_FILE, uid, gid)
    os.chmod(OUTPUT_FILE, 0o644)

 if __name__ == "__main__":
    generate()
    print(f"Sitemap successfully generated at {OUTPUT_FILE}")

3. Professional Styling (`sitemap.xsl`)

To prevent the "No style information" browser error, save this file to /var/www/html/sitemap.xsl. This makes the XML readable for humans while remaining valid for crawlers.

The Stylesheet (`sitemap.xsl`)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
                xmlns:html="http://www.w3.org/TR/REC-html40"
                xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
                xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <html xmlns="http://www.w3.org/1999/xhtml">
            <head>
                <title>XML Sitemap - Professional Index</title>
                <style type="text/css">
                    body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen-Sans, Ubuntu, Cantarell, "Helvetica Neue", sans-serif; color: #333; margin: 0; padding: 40px; background: #f9f9fb; }
                    .container { max-width: 1000px; margin: 0 auto; background: #fff; padding: 30px; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.05); }
                    h1 { color: #1a1a1a; font-size: 24px; margin-bottom: 10px; border-bottom: 2px solid #3498db; display: inline-block; padding-bottom: 5px; }
                    p { color: #666; margin-bottom: 30px; }
                    table { border-collapse: collapse; width: 100%; margin-top: 10px; }
                    th { background: #f8f9fa; text-align: left; padding: 12px 15px; border-bottom: 2px solid #edf2f7; font-weight: 600; color: #4a5568; }
                    td { padding: 12px 15px; border-bottom: 1px solid #edf2f7; word-break: break-all; }
                    tr:hover { background: #f7fafc; }
                    a { color: #3498db; text-decoration: none; }
                    a:hover { text-decoration: underline; }
                    .count-badge { background: #ebf8ff; color: #2b6cb0; padding: 2px 8px; border-radius: 12px; font-size: 12px; font-weight: 600; }
                </style>
            </head>
            <body>
                <div class="container">
                    <h1>XML Sitemap</h1>
                    <p>Generated for Google/Bing SEO. Total URLs indexed: <strong><xsl:value-of select="count(sitemap:urlset/sitemap:url)"/></strong></p>
                    <table>
                        <thead>
                            <tr>
                                <th width="75%">URL Location</th>
                                <th width="25%">Images Detected</th>
                            </tr>
                        </thead>
                        <tbody>
                            <xsl:for-each select="sitemap:urlset/sitemap:url">
                                <tr>
                                    <td>
                                        <a href="{sitemap:loc}"><xsl:value-of select="sitemap:loc"/></a>
                                    </td>
                                    <td>
                                        <span class="count-badge">
                                            <xsl:value-of select="count(image:image)"/> Images
                                        </span>
                                    </td>
                                </tr>
                            </xsl:for-each>
                        </tbody>
                    </table>
                </div>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

🤖 Automating with Systemd Path Units

Instead of manual execution, we use Debian's systemd to monitor the folder. The moment you sync from Publii, the sitemap regenerates.

Step 1: Create the Service (`/etc/systemd/system/sitemap-gen.service`)

Ini, TOML
[Unit]
Description=Generate Sitemap after Publii Sync

[Unit]
Description=Fix Permissions and Generate Sitemap
After=network.target

[Service]
Type=oneshot
User=root
Group=root

# Fix ownership and permissions before running the script
ExecStartPre=/bin/chown -R root:root /var/www/html
ExecStartPre=/usr/bin/find /var/www/html -type d -exec chmod 755 {} +
ExecStartPre=/usr/bin/find /var/www/html -type f -exec chmod 644 {} +

# Run the generator
ExecStart=/usr/bin/python3 /usr/local/bin/generate_sitemap.py

[Install]
WantedBy=multi-user.targetwww-data

Step 2: Create the Path Watcher (`/etc/systemd/system/sitemap-gen.path`)

Ini, TOML
[Unit]
Description=Watch /var/www/html for changes

[Path]
PathChanged=/var/www/html/
DelaySec=15

[Install]
WantedBy=multi-user.target

Step 3: Activate

Bash
sudo systemctl daemon-reload
sudo systemctl enable --now sitemap-gen.path

✅ Summary Checklist

[ ] Dependency: python3-bs4 installed.
[ ] Script: generate_sitemap.py configured with your domain.
[ ] Style: sitemap.xsl placed in web root.
[ ] Automation: systemd path units enabled.
[ ] SEO: Publii's internal sitemap generator disabled.

Now, your site remains agile with relative URLs internally, while providing a rock-solid, image-rich sitemap to the world automatically.

Publii Sitemap Tools

Site Migration, SEO and sitemaps

🛠 The Solution: A Native Python Post-Processor

1. Prerequisites

2. The Python Generator (generate_sitemap.py)

3. Professional Styling (sitemap.xsl)

The Stylesheet (sitemap.xsl)