Python
Python
Børre Stenseth
Tekst >Noen eksempler

Python HTML / XML

Hva

Noen øvelser i å bearbeide tekst

HTML fra en enkel tekstfil

Programmet leser inne en enkel tekstfil og plasserer innholdet i en pre-tag på en minimalistisk HTML-side. HTML-siden ligger som en tekst i programmet.

Input fil: frej1.txt
Output fil: frej1.html
Python koden:

"""
This is a modul that reads a textfil
wraps it in a very simple HTML-skeleton
and produce a HTML-page
"""
#------------------------
# HTML-fragments
HTML_PAGE="""<html>
<head>
    <title>a page</title>
</head>
<body>
<h1> Farbror Frej:</h1>
<pre>
%s
</pre>
</body>
</html>
"""
#-----------------------
# filenames
infile='frej1.txt'
outfile=infile.replace('.txt','.html')

#------------------------
# Read / write text files
def getTextFile(filename):
    try:
        file=open(filename,'r',encoding='utf-8')
        res=file.read()
        file.close()
        return res
    except:
        print ('Trouble reading: ',filename)
        return None
def storeTextFile(filename,txt):
    try:
        file=open(filename,'w',encoding='utf-8')
        file.write(txt)
        file.close()
    except:
        print ('Trouble writing to: ',filename)
#------------------------
# do the job
def doit():
    txt=getTextFile(infile)
    if txt!=None:
        txt=HTML_PAGE%txt
        print (txt)
        storeTextFile(outfile,txt)
doit()

Transformasjon: CSV-XML

Modulen boktoxml gjør noen av de grunnleggende operasjonene som inngår i å lage en XML-fil fra en kommaseparert fil:

  • åpner og leser en fil på en sivilisert måte
  • splitter innholdet opp i linjer, og forkaster meningsløse linjer
  • splitter hver linje i kommaseparete deler
  • bruker delene til å produsere XML-elementer
  • skriver alt tilbake til fil på en sivilisert måte

Input fil: bokliste.txt
Output fil:bokliste.xml (hvis nettleseren din tåler det)
Pythonkoden:

"""
 Transform a commaseparated (CSV) file to XML
 Input data as lines:
 title,author,publisher,year,isbn,pages,course,category,comment
"""
#----------------------------
# XML-skeletons
# a template for a xml-fragment
XMLFragment="""
<book isbn="%s" pages="%s">
      <title>%s</title>
      <course>%s</course>
      <category>%s</category>
      <author>%s</author>
      <publisher>%s</publisher>
      <year>%s</year>
      <comment>%s</comment>
</book>
"""
# a template for a complete xml-file
XMLFile="""<?xml version="1.0" encoding="utf-8"?>
<booklist>
%s
</booklist>
"""
#------------------------
# Read / write text files
def getTextFile(filename):
    try:
        file=open(filename,'r',encoding='utf-8')
        res=file.read()
        file.close()
        return res
    except:
        print ('Trouble reading: ',filename)
        return None
def storeTextFile(filename,txt):
    try:
        file=open(filename,'w',encoding='utf-8')
        file.write(txt)
        file.close()
    except:
        print ('Trouble writing to: ',filename)        
#--------------------------------
# produce and save XML
def makeXML(filename='bokliste.txt'):
    # les en text fil
    text=getTextFile(filename)
    if(text==''):
        return
    content=''
    # plukk ut linjene
    lines=text.split('\n')
    for line in lines:
        line.strip()
        # drop tomme linjer og kommentarlinjer
        if(len(line)<2):
            continue
        if(line[0:2]=='//'):
            continue
        # har en boklinje, finn delene
        pcs=line.split(',')
        if(len(pcs)!=9):
            print ('ignore:' , line)
            continue
        content+=XMLFragment%(pcs[4],pcs[5],pcs[0],pcs[6],
                              pcs[7],pcs[1],pcs[2],pcs[3],pcs[8])
    
    storeTextFile(filename.replace('.txt','.xml'),XMLFile%content)
makeXML()

Transformasjon: CSV-HTML

Modulen boktohtml tar den samme tekst-fila som i eksempelet ovenfor og transformerer den til en htmlfil som viser en liste av bøker med forfatter

Input fil: bokliste.txt
Output fil: bokliste.html
Python koden:

# transform a commaseparated (CSV) file to HTML
"""
 Transform a commaseparated (CSV) file to HTML
 Input data as lines:
 title,author,publisher,year,isbn,pages,course,category,comment
"""
#----------------------------
# HTML-skeletons
# a template for a complete html-file
HTMLFile="""<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8"/>
    <title>books</title>
    <style>
     li{margin-top:10;}
     .fat{font-weight:bold; color:red}
    </style>
</head>
<body>
<h1>Bokliste</h1>
<ul>
%s
</ul>
</body>
</html>
"""
# a template for a html-fragment, one author
HTMLFragment="""
    <li>
        <div class="fat">%s</div>
        <div>%s</div>
    </li>
"""
#------------------------
# Read / write text files
def getTextFile(filename):
    try:
        file=open(filename,'r',encoding='utf-8')
        res=file.read()
        file.close()
        return res
    except:
        print ('Trouble reading: ',filename)
        return None
def storeTextFile(filename,txt):
    try:
        file=open(filename,'w',encoding='utf-8')
        file.write(txt)
        file.close()
    except:
        print ('Trouble writing to: ',filename )       
#--------------------------------
# produce and save HTML
def makeHTML(filename='bokliste.txt'):
    # read the input file
    text=getTextFile(filename)
    if (text==None) or (text==''):
        return
    content=''
    # pick up lines
    lines=text.split('\n')
    for line in lines:
        line.strip()
        # drop too short lines
        if(len(line)<2):
            continue
        # drop commentlines
        if(line[0:2]=='//'):
            continue
        # We have a line , find elements
        pcs=line.split(',')
        # acceptable ?
        if(len(pcs)!=9):
            print ('ignoring:' , line)
            continue
        content+=HTMLFragment%(pcs[0].strip(),pcs[1].strip())
    
    storeTextFile(filename.replace('.txt','.html'),
                  HTMLFile%content)
makeHTML()
Tekst >Noen eksempler