XSLT
XSL-FO
PDF
FOP
lxml
Prince
XSLT
Børre Stenseth
Olympiade >XML2PDF

PDF

Hva

Vi ønsker å lage pen papirutskrift av våre olympiske resultater. Vi kan gjøre dette ved å lage en egnet CSS-fil og kople den til en HTML-framstilling av resultatene. I dette tilfellet velger vi å lage PDF (Adobe Portable Document Format).

Vi skal gjøre dette på tre forskjellige måter:

  • ved hjelp av XSL-FO og FOP, se modulen XSL-FO
  • ved hjelp av Prince-XML [1]
  • ved hjelp av Python,lxml [2] og Prince-XML

XSL-FO og FOP

Vi gjør jobben som en to-stegs operasjon:

xsl-fo-olymp
Transformasjon fra XML til XSL_FO og preparering av PDF
  1. Vi skriver en XSLT-transformasjon, xml-to-fo.xslt, som produserer en fil som er tagget som XSL-FO (formatering), olympic.fo.
  2. Vi bruker et standard program, FOP, som blandt mye annet kan lage PDF fra XSL-FO dokumenter.

XSLT-transformasjonen, xml-to-fo.xslt, ser slik ut:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="xml" version="1.0"
              encoding="ISO-8859-1" indent="yes"/>
<xsl:template match="/">
  <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
  <fo:layout-master-set>
  <!-- layout for the first page -->
  <fo:simple-page-master master-name="coverpage"
                page-height="29.7cm"
                page-width="21cm"
                margin-top="7cm"
                margin-bottom="2cm"
                margin-left="2.5cm"
                margin-right="2.5cm">
    <fo:region-body margin-top="3cm"  margin-bottom="1.5cm"/>
    <fo:region-before extent="2cm"/>
    <fo:region-after extent="1.5cm"/>
  </fo:simple-page-master>
  <!-- layout for all other pages -->
  <fo:simple-page-master master-name="pages"
                page-height="29.7cm"
                page-width="21cm"
                margin-top="1cm"
                margin-bottom="2cm"
                margin-left="2.5cm"
                margin-right="2.5cm">
    <fo:region-body margin-top="1cm"  margin-bottom="1cm"/>
    <fo:region-before extent="2cm"/>
    <fo:region-after extent="1.5cm"/>
  </fo:simple-page-master>
  </fo:layout-master-set>
  <!-- filling the front page -->
  <fo:page-sequence master-reference="coverpage">
  <fo:flow flow-name="xsl-region-body">
  <fo:block font-weight="bold" font-size="28pt"
            line-height="38pt" font-family="Times">
 Olympiske resultater
  </fo:block>
  <fo:block font-weight="normal" font-size="13pt"
            line-height="15pt" font-family="Times">
  Sprint�velsene i de siste olympiadene
  </fo:block>
    </fo:flow>
    </fo:page-sequence>
  <!-- doing all olympics in turn -->
  <xsl:apply-templates select="/IOC/OlympicGame">
    <xsl:sort select="@year"/>
  </xsl:apply-templates>
</fo:root>
</xsl:template>
<xsl:template match="//OlympicGame">
    <fo:page-sequence  master-reference="pages">
    <fo:flow flow-name="xsl-region-body">
      <fo:block font-weight="bold" font-size="18pt"
                line-height="28pt" font-family="Times"
                padding-top="0cm"
                border-bottom-color="black"
                border-bottom-style="solid">
        <xsl:element name="fo:external-graphic">
           <xsl:attribute name="src">
            <xsl:value-of select="@place"/>.gif</xsl:attribute>
        </xsl:element>
      </fo:block>
              <xsl:apply-templates select="event">
              <xsl:sort select="@dist"/>
              </xsl:apply-templates>
      </fo:flow>
    </fo:page-sequence>
</xsl:template>
<xsl:template match="//event">
  <fo:block font-weight="bold" font-size="12pt"
           line-height="14pt" font-family="Times"
            padding-top="1cm"  >
         <xsl:value-of select="@dist"/>
  </fo:block>
  <xsl:apply-templates select="athlet">
  <xsl:sort  data-type="number" select="result"/>
  </xsl:apply-templates>
</xsl:template>
<xsl:template match="//athlet">
  <fo:block font-size="10pt" line-height="14pt"
           font-family="Times" >
      <xsl:value-of select="name"/>,
      <xsl:value-of select="nation"/>   :
      <xsl:value-of select="result"/>
  </fo:block>
</xsl:template>
</xsl:stylesheet>
Resultatet https://borres.hiof.no/wep/xslt/ol/xml2pdf/olympic.fo

Vi forutsetter at FOP er installert i katalogen fop, og kan kalle FOP fra kommandolinja slik:

 c:\fop\fop.bat olympic.fo olympic.pdf
Resultatet https://borres.hiof.no/wep/xslt/ol/xml2pdf/olympic.pdf

PRINCE

Vi gjør følgende:

xsl-prince
Fra XML via HTML til PDF ved hjelp av Prince

Transformasjonen som lage html er i all hovedsak den samme som den som er brukt i modulen XML2HTML :

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" 
     version="1.0" 
     encoding="ISO-8859-1" 
     indent="yes"
     doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"  
     doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="/">
    <html>
      <head>
         <title>Olympics</title>
       </head>
       
       <body>
         <h1>Resultater sprint</h1>
         <p>fra de siste olympiske leker</p>
         <hr/>
         <xsl:apply-templates select="IOC/OlympicGame">
         <xsl:sort select="@year" order="ascending"/>
         </xsl:apply-templates>
        </body>
        
    </html>
</xsl:template>
<xsl:template match="OlympicGame">
  <table cellpadding="10">
    <tr>
      <td>
      <xsl:element name="img">
          <xsl:attribute name="src">
            <xsl:value-of select="@place"/>.gif</xsl:attribute>
          <xsl:attribute name="alt">
             <xsl:value-of select="@place"/></xsl:attribute>
      </xsl:element>
         </td>
         <td>
         <h1><xsl:value-of select="@place"/> <br/>
         <xsl:value-of select="@year"/></h1>
         </td>
      </tr>
  </table>
  <table cellpadding="10" border="0" cellspacing="0">
    <tr>
      <xsl:apply-templates select="event"/>
    </tr>
  </table>
</xsl:template>
<xsl:template match="//event">
  <td valign="top">
    <h2><xsl:value-of select="@dist"/></h2>
     <xsl:apply-templates select="athlet">
     <xsl:sort  data-type="number" select="result"/>
     </xsl:apply-templates>
  </td>
</xsl:template>
<xsl:template match="athlet">
   <p><xsl:value-of select="name"/><br/>
   <xsl:value-of select="nation"/><br/>
   <xsl:value-of select="result"/></p>
</xsl:template>
</xsl:stylesheet>

Resultatet, prepared.html, er slik:

Resultatet https://borres.hiof.no/wep/xslt/ol/xml2pdf/ol2pdfprince/prepared.html

CSS-fila som beskriver layout til PDF-fila er svært enkel (printpages.css):

@page { size: A4;
        margin: 100pt 40pt 40pt 90pt;
        @top-left {
            content:"demo";
        }
        @top-right {
            content:"Markup og Web";
            font-size:24px;
        }
        @bottom-right {
          content: counter(page);
            font-style: italic;
            font-size:11px;
          border-top-style:solid;
          border-top-width:thin;
        }
        @bottom-left {
          content:"B. Stenseth";
            font-style: italic;
            font-size:11px;
          border-top-style:solid;
          border-top-width:thin;
        }
}
h1{page-break-before:always}

Resultatet, prepared.pdf, er slik:

Resultatet https://borres.hiof.no/wep/xslt/ol/xml2pdf/ol2pdfprince/prepared.pdf

Python, lxml og Prince

Vi skal gjøre følgende sammensatte transformasjon

xsl-trans-prince
Fra XML via HTML til PDF ved hjelp av Prince, alt kontrollert fraPython

Vi begynner med å skrive en enkel transformasjon, tohtml.xsl, som lager en HTML-fil. det som i hovedsak skiller denne transformasjonen fra den vi så på i avsnittet over er at denne gangen lager vi en innholsdfortegnelse.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" 
     version="1.0"  omit-xml-declaration="yes"
     encoding="UTF-8" 
     indent="no"/>
     
<xsl:template match="/">
<xsl:text disable-output-escaping='yes'>&lt;!DOCTYPE html>
    </xsl:text>
<html>
    <head>
        <title>Olympiade</title>
        <link href="olscreen.css" rel="stylesheet" />
        <link href="olprint.css" rel="stylesheet" />
        <link href="olprojection.css" rel="stylesheet" />
    </head>    
    <body>
    <div id="heading">Olympiske sprintresultater</div>
    <xsl:call-template name="toc"/>
    <xsl:apply-templates select="IOC/OlympicGame">
        <xsl:sort select="@year" order="ascending"/>
    </xsl:apply-templates>
    </body>
</html>
</xsl:template>
<xsl:template match="OlympicGame">
    <xsl:element name="a">
        <xsl:attribute name="name"><xsl:value-of select="@place"/></xsl:attribute>
        <h1>
            <xsl:value-of select="@place"/> - <xsl:value-of select="@year"/>
        </h1>
    </xsl:element>
    <div>
        <xsl:element name="img">
        <xsl:attribute name="src"><xsl:value-of select="@place"/>.gif</xsl:attribute>
        <xsl:attribute name="alt"><xsl:value-of select="@place"/></xsl:attribute>
        </xsl:element>
    </div>
<xsl:apply-templates select="event"/>
</xsl:template>

<xsl:template match="event">
    <h2><xsl:value-of select="@dist"/></h2>
    <xsl:apply-templates select="athlet">
        <xsl:sort data-type="number" select="result"/>
    </xsl:apply-templates>
</xsl:template>
<xsl:template match="athlet">
<div class="athlet">
<p><xsl:value-of select="name"/></p>
<p><xsl:value-of select="nation"/></p>
<p><xsl:value-of select="result"/></p>
</div>
</xsl:template>
<xsl:template name="toc">
    <xsl:element name="div">
    <xsl:attribute name="id">maintoc</xsl:attribute>
    
    <div class="tocheader">Innhold</div>
    
    <xsl:for-each select="//OlympicGame">
        <xsl:sort select="@year" order="ascending"/>
        <div class="toclevel1">
            <xsl:element name="a">
                <xsl:attribute name="href">#<xsl:value-of select="@place"/></xsl:attribute>
                <xsl:value-of select="@place"/>
            </xsl:element>
        </div>        
    </xsl:for-each>
    </xsl:element>
</xsl:template>
</xsl:stylesheet>

Resultatet, bok.html, av transformasjonen er slik:

Resultatet https://borres.hiof.no/wep/xslt/ol/xml2pdf/ol2pdfprince2/bok.html

Så lager vi noen Pythonmoduler som skal forestå den kombinerte opersjonen:

  1. transformasjon XML - > HTML
  2. kall på Prince for å lage PDF

Selve transformasjonen, transform.py, gjøres slik:

"""
Transforming XML to HTML using lxml
"""
from lxml import etree
def produce(xmlfile,xsltfile):
    xmlTree=etree.parse(xmlfile)
    xsltTree=etree.parse(xsltfile)
    transform=etree.XSLT(xsltTree)
    resultTree=transform(xmlTree)
    return str(resultTree)

I modulen, makesingle.py, nedenfor er det metoden: doSinglePageJob som anvender Prince.

"""
The purpose of this module is to make a PDF-files from a HTML file
One to One
The converterengine is PrinceXML
Parameters to this module when run from the commandline
is any number of HTML-files.
The PDF files will have same name, but pdf as extension
"""
import subprocess
import sys
import utils
import transform

#--------------------
# fixed paths and logging
""" catalog """
cat='c:\\web\\dw\\olymp\\'
""" prince path """
princepath='c:\\fixed\\prince\\engine\\bin\\prince.exe'
""" log file """
logfile=cat+'ol2pdfprince2\\princelog.txt'
""" print log after job """
printlog=False
""" full report """
verbose=False
""" all stylesheets """
stylesheets=[cat+'ol2pdfprince2\\olsheet1.css']
""" erase log file """
def eraseLog():
    utils.storeTextFile(logfile,'')
"""
Do one page to one page
"""
def doSinglePageJob(infile,outfile):
        print infile+' -> '+outfile
        params=[princepath,infile,'-o '+outfile,'--log='+logfile]
        if verbose:
            params.append('-v')
        for style in stylesheets:
            params.append("-s "+style)
        print 'making: '+outfile
        subprocess.call(params)

#--------------------------------
if __name__=="__main__":   
    T=transform.produce(cat+'all_results.xml',
                        cat+'ol2pdfprince2\\tohtml.xsl')
    utils.storeTextFile(cat+'ol2pdfprince2\\bok.html',T)
    doSinglePageJob(cat+'ol2pdfprince2\\bok.html',
                    cat+'ol2pdfprince2\\bok.pdf')
        
    
    

Modulen utils inneholder bare to metoder for filaksess:

""" load a text file """
def getTextFile(filename):
    try:
        file=open(filename,'r')
        intext=file.read()
        file.close()
        return intext
    except:
        print 'Error reading file ',filename
        return ''
""" store a text file """
def storeTextFile(filename,txt):
    try:
        file=open(filename,'w')
        file.write(txt)
        file.close()
    except:
        print 'Trouble writing to: '+filename

Stilarket, olsheet1.css, som brukes til PDF-produksjonen ser slik ut:

@page { size: A4;
        margin: 100pt 40pt 40pt 90pt;
        @top-left {
            content:url(http://www.ia.hiof.no/~borres/common/gfx/printlogo.gif);
        }
        @top-right {
            content: string(doctitle);
            font-size:18px;
        }
        @bottom-right {
          content: counter(page);
            font-style: italic;
            font-size:11px;
          border-top-style:solid;
          border-top-width:thin;
        }
        @bottom-left {
          content:"B. Stenseth";
            font-style: italic;
            font-size:11px;
          border-top-style:solid;
          border-top-width:thin;
        }
}
@page:first {
        @top-left {
            content:url(http://www.ia.hiof.no/~borres/common/gfx/printlogo_txt.gif);
        }
        @top-right {
            content:"";
            font-size:24px;
        }
}
#heading{margin-top:50px;margin-bottom:50px;font-weight:bold;font-size:36px}
h1 { string-set: doctitle content() }
h1,h2 {page-break-before:always}
#maintoc a::after { content: leader(".") target-counter(attr(href), page); } 
.tocheader{margin-top:50px;margin-bottom:50px;font-weight:bold;font-size:20px}
.toclevel1{margin-left:20px;line-height:150%}
.athlet p{line-height:70%}
.athlet :first-child{font-weight:bold}

/* linking NB: sequence is important */
a:link     {color:black;text-decoration:none}
a:visited {color:black;text-decoration:none}
a:hover   {color:black;text-decoration:none}
a:active  {color:black;text-decoration:none}

Stilarkene nedenfor benyttes til skjerm, print og projection (F11 i Opera)

@media screen
{
    h1{color:red}
}
@media print
{
    h1{color:blue;page-break-before:always}
}
@media projection
{
    #maintoc,img{display:none}
    h1,h2{page-break-before:always}
    
    .athlet {line-height:60%;margin-left:150px;margin-top:40px;}
    .athlet :first-child{font-weight:bold;font-size:20px}
    
    #heading{margin-left:150px;margin-top:150px;font-size:46px}
}

Resultatet, bok.pdf, er slik:

Resultatet https://borres.hiof.no/wep/xslt/ol/xml2pdf/ol2pdfprince2/bok.pdf
> [3] [1] [2]
Referanser
  1. Prince XML Prince www.princexml.com/ 14-03-2014
  1. lxml - XML and HTML with Python lxml.de/ 03-03-2014
  1. FOP apache.org xmlgraphics.apache.org/fop/ 24-02-2014
Olympiade >XML2PDF