Mastering Publii Sitemaps: Absolute SEO with Relative URLs

Site Migration, SEO and sitemaps

Company: Personal Project

Publii + Relative URLs = Broken Sitemaps. Let's fix that. Using a native Python script and Systemd path monitoring on Debian, I’ve automated the generation of a professional, image-rich sitemap that bridges the gap between internal networking needs and external SEO requirements.

When building a static site with Publii, choosing "Relative URLs" is often a necessity for internal networking or split-DNS environments. However, this breaks the sitemap.xml, which requires absolute URLs for search engines to function.

This guide provides a professional, secure, and fully automated solution using Python and Systemd on a Debian/lighttpd stack.


🛠 The Solution: A Native Python Post-Processor

We bypass the internal Publii generator in favor of a custom Python script. This script "crawls" your local directory, extracts every image path, and prepends your public domain to create a Google-compliant sitemap.

1. Prerequisites

Install the only required dependency for secure HTML parsing:

Bash
sudo apt update && sudo apt install python3-bs4

2. The Python Generator (generate_sitemap.py)

Save this script in /usr/local/bin/generate_sitemap.py. Be sure to update the PUBLIC_URLand SITE_DIR variables.

Note: Change https://yourdomain.com to your real domain before hitting enter.

Python

import os
import pwd
import grp from bs4 import BeautifulSoup # --- CONFIGURATION --- SITE_DIR = "/var/www/html" PUBLIC_URL = "https://yourdomain.com" OUTPUT_FILE = os.path.join(SITE_DIR, "sitemap.xml") EXCLUDE_FOLDERS = {'assets', 'cgi-bin', 'tmp', '404', 'tags', 'authors'} # --------------------- def generate(): items = [] for root, dirs, files in os.walk(SITE_DIR): # Skip excluded folders dirs[:] = [d for d in dirs if d not in EXCLUDE_FOLDERS] for file in files: if file.endswith(".html"): full_path = os.path.join(root, file) rel_path = os.path.relpath(full_path, SITE_DIR).replace("\\", "/") # Clean URL formatting clean_path = rel_path.replace("index.html", "") page_url = f"{PUBLIC_URL}/{clean_path}".rstrip("/") if not clean_path: page_url = f"{PUBLIC_URL}/" images = [] try: with open(full_path, 'r', encoding='utf-8') as f: soup = BeautifulSoup(f, 'html.parser') for img in soup.find_all('img'): src = img.get('src') if src: img_url = src if src.startswith('http') else f"{PUBLIC_URL}{src if src.startswith('/') else '/' + src}" images.append(img_url) except Exception as e: print(f"Error reading {full_path}: {e}") items.append({'loc': page_url, 'images': list(set(images))}) # This block must be aligned with the "for root..." loop xml = [ '<?xml version="1.0" encoding="UTF-8"?>', '<?xml-stylesheet type="text/xsl" href="/sitemap.xsl"?>', '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">' ] for item in items: xml.append(' <url>') xml.append(f' <loc>{item["loc"]}</loc>') for img in item['images']: xml.append(' <image:image>') xml.append(f' <image:loc>{img}</image:loc>') xml.append(' </image:image>') xml.append(' </url>') xml.append('</urlset>') with open(OUTPUT_FILE, 'w', encoding='utf-8') as f: f.write('\n'.join(xml))

# Force ownership of the generated file to www-data
    uid = pwd.getpwnam("www-data").pw_uid
    gid = grp.getgrnam("www-data").gr_gid
    os.chown(OUTPUT_FILE, uid, gid)
    os.chmod(OUTPUT_FILE, 0o644)

if __name__ == "__main__": generate() print(f"Sitemap successfully generated at {OUTPUT_FILE}")

3. Professional Styling (sitemap.xsl)

To prevent the "No style information" browser error, save this file to /var/www/html/sitemap.xsl. This makes the XML readable for humans while remaining valid for crawlers.

The Stylesheet (sitemap.xsl)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
                xmlns:html="http://www.w3.org/TR/REC-html40"
                xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
                xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <html xmlns="http://www.w3.org/1999/xhtml">
            <head>
                <title>XML Sitemap - Professional Index</title>
                <style type="text/css">
                    body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen-Sans, Ubuntu, Cantarell, "Helvetica Neue", sans-serif; color: #333; margin: 0; padding: 40px; background: #f9f9fb; }
                    .container { max-width: 1000px; margin: 0 auto; background: #fff; padding: 30px; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.05); }
                    h1 { color: #1a1a1a; font-size: 24px; margin-bottom: 10px; border-bottom: 2px solid #3498db; display: inline-block; padding-bottom: 5px; }
                    p { color: #666; margin-bottom: 30px; }
                    table { border-collapse: collapse; width: 100%; margin-top: 10px; }
                    th { background: #f8f9fa; text-align: left; padding: 12px 15px; border-bottom: 2px solid #edf2f7; font-weight: 600; color: #4a5568; }
                    td { padding: 12px 15px; border-bottom: 1px solid #edf2f7; word-break: break-all; }
                    tr:hover { background: #f7fafc; }
                    a { color: #3498db; text-decoration: none; }
                    a:hover { text-decoration: underline; }
                    .count-badge { background: #ebf8ff; color: #2b6cb0; padding: 2px 8px; border-radius: 12px; font-size: 12px; font-weight: 600; }
                </style>
            </head>
            <body>
                <div class="container">
                    <h1>XML Sitemap</h1>
                    <p>Generated for Google/Bing SEO. Total URLs indexed: <strong><xsl:value-of select="count(sitemap:urlset/sitemap:url)"/></strong></p>
                    <table>
                        <thead>
                            <tr>
                                <th width="75%">URL Location</th>
                                <th width="25%">Images Detected</th>
                            </tr>
                        </thead>
                        <tbody>
                            <xsl:for-each select="sitemap:urlset/sitemap:url">
                                <tr>
                                    <td>
                                        <a href="{sitemap:loc}"><xsl:value-of select="sitemap:loc"/></a>
                                    </td>
                                    <td>
                                        <span class="count-badge">
                                            <xsl:value-of select="count(image:image)"/> Images
                                        </span>
                                    </td>
                                </tr>
                            </xsl:for-each>
                        </tbody>
                    </table>
                </div>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

🤖 Automating with Systemd Path Units

Instead of manual execution, we use Debian's systemd to monitor the folder. The moment you sync from Publii, the sitemap regenerates.

Step 1: Create the Service (/etc/systemd/system/sitemap-gen.service)

Ini, TOML
[Unit]
Description=Generate Sitemap after Publii Sync

[Unit]
Description=Fix Permissions and Generate Sitemap
After=network.target

[Service]
Type=oneshot
User=root
Group=root

# Fix ownership and permissions before running the script
ExecStartPre=/bin/chown -R root:root /var/www/html
ExecStartPre=/usr/bin/find /var/www/html -type d -exec chmod 755 {} +
ExecStartPre=/usr/bin/find /var/www/html -type f -exec chmod 644 {} +

# Run the generator
ExecStart=/usr/bin/python3 /usr/local/bin/generate_sitemap.py

[Install]
WantedBy=multi-user.targetwww-data

Step 2: Create the Path Watcher (/etc/systemd/system/sitemap-gen.path)

Ini, TOML
[Unit]
Description=Watch /var/www/html for changes

[Path]
PathChanged=/var/www/html/
DelaySec=15

[Install]
WantedBy=multi-user.target

Step 3: Activate

Bash
sudo systemctl daemon-reload
sudo systemctl enable --now sitemap-gen.path

✅ Summary Checklist

  • [ ] Dependency: python3-bs4 installed.

  • [ ] Script: generate_sitemap.py configured with your domain.

  • [ ] Style: sitemap.xsl placed in web root.

  • [ ] Automation: systemd path units enabled.

  • [ ] SEO: Publii's internal sitemap generator disabled.

Now, your site remains agile with relative URLs internally, while providing a rock-solid, image-rich sitemap to the world automatically.

  Publii Sitemap Tools