Wednesday, April 2, 2014

Extracting Data from XML files using XSL and XPath...1

Having written sundry stories and all, I thought I will share some knowledge on XSLT, XML and XPATH, that I learned in my previous project. Specially, if you want to get data from XML files, you must refer to the attached doc.

Introduction

XML Stylesheets are powerful tool used for converting data from an XML file to a readable/better format. The word ‘better’ here means that we can create a fresh XML or an HTML file also, that can be opened in a browser. Currently, in the industry the usage of XML files for storing the data is on the rise. Hence, to change the data to a readable format we need to use a scripting language, which will read the data from XML file and give the output in a readable format, mostly a comma separated file (csv), which can be opened in Microsoft Excel.

Writing code to handle XML transformations in XSLT is much easier than in any other commonly used programming language. But the XSLT language has such a different syntax and processing model from classical programming languages that it takes time to grasp all of XSLT's subtle nuances.

XSL (Extensible Stylesheet Language Transformations) is a powerful scripting language that we can use for the purpose. It works in tandem with a transformation tool and produces the output in a readable format. The data is read from the XML file, style is taken from XML Stylesheet, and data in that style is written to an output file. The output file can be a csv file, an xml file or an html file. The below diagram shows how this works (source: wiki):


In the above diagram XML input is the XML document, XSLT Code is the code written in XML Stylesheet language and XSLT Processor is the compiler/interpreter or any tool that can produce the output using the first two.

Available XSLT Processors


Currently in market we have several tools (XSLT Processors) available:
1.    XML Starlet – This tool is launched under MIT license. Currently it supports XSLT 1 and XSLT 2. There is no news when the support for XSLT 3 will commence. It is free tool, and primarily works as an interpreter.
2.    Altova/XML Spy – This is a commercial tool. It also supports only XSLT 1 and XSLT 2 as of now.
3.    Xalan – It is released under Apache license. It supports only XSLT 1.
5.    SaxonThis is also an XSLT tool that is freely available. It uses command line options. The commercial version of the software is available in two formats - Saxon-PE 9.5 and Saxon-EE 9.5. Both of these support XSLT 1, XSLT 2 and XSLT 3.

XSLT is Turing-complete. It means that if we have sufficient memory, XSLT can calculate or process anything that any other Turing-complete language (such as C++) can calculate or process. Its capabilities are immense if we can fathom them. 

No comments:

Post a Comment