XML Tutorial – Learn to use for the Raspberry Pi! #2

XML Tutorial – Learn to use for the Raspberry Pi andXML Arduino! #2

This series of articles discusses the use of XML in applications for the Raspberry Pi and Arduino. Part One introduced XML and the format of its data structures. In Part Two we cover building and parsing XML in Python, while in Part Three we will show how XML is used as a communications protocol for a client / server application, RasPiConnect. RasPiConnect is an iPad/iPhone app that connects and displays information for any number of Raspberry Pi’s via a defined XML interface.

 

XML Tutorial – Learn to use this important language for the Raspberry Pi!  Part 1

XML Tutorial – Learn to use this important language for the Raspberry Pi!  Part 2

XML Tutorial – Learn to use this important language for the Raspberry Pi!  Part 3

What do we mean by parsing?

Parsing refers to the syntactic analysis of the XML input into its component parts in order to facilitate executing code based on the result of the analysis. In other words, the program is “reading” the XML to find values that it is looking for, paying attention to proper syntax and form. XML syntax includes a nested hierarchy of elements. This means that each level of the hierarchy is included as a fully enclosed subset of the previous level. In our example below, each object <XMLCOMMAND> is fully enclosed (“nested”) in the <XMLObjectXMLRequest>. You can extend this nesting as far down as you like. When you write parsing code this nesting usually results in for loops in Python iterating through all the objects at a level in the hierarchy.

Options for Parsing XML in Python

There are many different packages for parsing XML in Python. We will be using xml.etree.ElementTree. ElementTree is a simple to use, fast XML tree library built into Python. It is somewhat limited in features, but for straightforward XML message parsing it is hard to beat.

What do you need to know about ElementTree? Very few commands are needed to parse simple XML. These few will be illustrated below.

Python Example Code

 import xml.etree.ElementTree as ET
 incomingXML = """
 <XMLObjectXMLRequests>
  <XMLCOMMAND>
   <OBJECTSERVERID>W-1</OBJECTSERVERID>
   <OBJECTNAME>StatusWebView</OBJECTNAME>
   <OBJECTTYPE>1</OBJECTTYPE>
   <OBJECTID>7</OBJECTID>
  </XMLCOMMAND>
  <XMLCOMMAND>
   <OBJECTSERVERID>M-2</OBJECTSERVERID>
   <OBJECTNAME>Processes</OBJECTNAME>
   <OBJECTTYPE>64</OBJECTTYPE>
   <OBJECTID>0</OBJECTID>
  </XMLCOMMAND>
 </XMLObjectXMLRequests>"""
 root = ET.fromstring(incomingXML)
 print incomingXML
 # iterate through all the values
 for element in
 root.findall('XMLCOMMAND'):
     print 'XMLCOMMAND'
     print 'OBJECTNAME:',\
         element.find('OBJECTNAME').text
     print 'OBJECTTYPE:',\
         element.find('OBJECTTYPE').text
     print 'OBJECTSERVERID:',\
 element.find('OBJECTSERVERID').text
     print 'OBJECTID:',\
         element.find('OBJECTID').text

2000px-Text-xml.svgSetup the ElementTree data

After the import of the ElementTree code and writing the XML to a string (note: You could be reading this from a file or a web request), we first set up the root of the XML hierarchy. The root of this XML code is <XMLObjectXMLRequests>.

Iterate through the list

We know from looking at the XML file, that <XMLObjectXMLRequests> consists of a number of <XMLCOMMAND> objects. We use a for loop to do this (each element inside the root is a <XMLCOMMAND> object) using the ElementTree command findall (finding all XMLCOMMAND objects in this case).

Parse the individual items

In the interior of the for loop, we now parse the individual elements of the <XMLCOMMAND> object. Here we use the ElementTree element command with the text attribute. Note that the <XMLCOMMAND> elements are not in the same order! XML does not care if elements on the same level are in any particular order. Furthermore, it is not guaranteed that the first <XMLCOMMAND> element will be the first one retrieved by ElementTree.

Expected elements can be missing from objects. In the case of missing elements in Python (using ElementTree) you absolutely must use an if statement to deal with the missing element. If you

do not then you risk causing a Python exception when operating on the returned value as ElementTree returns a None and not a valid value. If you are using strings as values, you will probably want to set your string variable to a “” (empty string) rather than allowing it to be set to a Python None. This is a very common mistake in writing ElementTree code.

 if (element.find('XXXX').text == None):
                 #do something

Uses for XML in Python programs

XML is used extensively in the software industry, ranging from HL7 messages in Healthcare, Simple Object Access Protocol (SOAP) for client-server information exchange, and even XML is used in Microsoft Word files. The key advantages of using XML are cross system use, readability, expandability and the ability to edit the XML in a text editor.

Programmers often use XML to read and write configuration files to the disk, speeding debugging and development. This makes it easier to set up test suites for programs as you can read the same XML structures from the disk as you would send across the Internet in a web request. The expandability of XML allows you to add new parameters and structures in your Python programs while maintaining backwards compatibility. Part Three of this series will show how this is done in Python.

Conclusion

XML is wordy and as a result uses a fair bit of disk space to store and memory to process. In the Raspberry Pi world and across the xmlInternet this generally does not matter. However, in microcontrollers such as the Arduino, RAM memory space is at a premium, so a more “dense” and simple protocol such as JSON is more appropriate. If disk space is at a premium, XML will compress extremely well because of all the duplication of keywords and descriptions.

XML is easy to read, parse and debug for beginners and seasoned programmers alike.