Saturday, April 5, 2014

Extracting Data from XML files using XSL and XPath...8

Comments in HTML file

Sometimes we might need to have comments in HTML file that we generate from the XML file. This could be needed if we require opening the HTML file and edit it. Comments can then help us understand the contents of the HTML file. But we cannot add comments using <!—and --> since that allows the XML Starlet interpreter to skip anything within that. XSL has a comment tag for that. Hence, if we need to add a comment in the file that is generated, we can use the construct:
<xsl:comment> --------------------------- </xsl:comment>
An example is given below:
In the above code snippet, we are doing something to get data for entity ‘DC10’. This is being put in some table column (indicated by <td> tag). Note the comment at the top. The output HTML will look something like this:

The ‘./DC10’ in the XSL code generated value as 536 in HTML.

Important points to remember

1. The understanding of XML structure is very important; unless you understand it well don’t start coding. Draw a hierarchy diagram of XML tags to understand the XML structure. The best way to know the XML structure is to go through the file manually in Notepad++. Besides this, there is a command that can be used:

xml el table.xml

The above command when run through in the XML starlet folder on the command prompt, will give the structure of table.xml file. The output can look somewhat like below:

I have marked the structure of one the files on which I ran the command. The ‘el’ command is XML Starlet command (just as ‘tr’ is).
2. If one understands the XML structure, data can be handled and extracted from almost any XML.
3. Basic DOS commands must be understood by the developer. It will be better if you can prepare a batch file within the XML Starlet folder. Batch file can contain code something like this:

del Transition_Flag_DC23_TPData.csv
xml tr ALL_Test_CasesDC2-3Transition.xsl  test.xml >> Transition_Flag_DC23_TPData.csv

This code deletes the old csv file. After that it runs the xml transform command on the ‘test.xml’ using the ‘ALL_Test_CasesDC2-3Transition.xsl’file producing the output in new ‘Transition_Flag_DC23_TPData.csv’ file.
4. It will also be better if the XSLT code and XML code be present in the XML Starlet folder. In that case the running of the batch file will be easy and it will operate in that folder, producing the desired output file.
5. Sometimes the XML Starlet refuses to parse the XML file stating that it contains and error. In that case you will have to edit the XML file in Notepad++, such that any tags causing the problem are either deleted or edited. In case of XML starlet erred out because of XSLT, code editing is required.

Advantages and Disadvantages of XML and XSLT
Advantages
1. XSLT applies user defined transformations to an XML document and the output can be HTML, XML, or any other structured document. So it is easy to merge XML data into presentation.
2. XPath is used by XSLT to locate elements/attribute within an XML document. So it is more convenient way to traverse an XML document rather than a traditional way, by using scripting language.
3. By separating data (XML document) from the presentation (XSLT), it is very easy to change the output format in any time easily without touching the code-behind.
4. The output is generated at a fast pace.
5. XML is platform independent.

Disadvantages
1. It is difficult to implement complicate business rules in XSLT.
2. Changing variable value in looping is difficult in XSLT.
3. Development and maintenance of code in XSLT is difficult as it is different from traditional programming languages such as C, C++ and Java.
4. Using XSLT have performance penalty in some cases as its engine (e.g. XML starlet) doesn’t optimize code by using caching technique like traditional compiler.
5. When processing XML files, XSLT must load the entire document into memory. With Xalan, this consumes roughly 10x the size of the input file (saxon has an alternative DOM implementation that uses less memory). If any of your input datasets grow beyond a couple of hundred megabytes, your XSLT processor might just flake out. This has been taken care of in the XSLT 3.0.

Important resources/links

1. http://www.xsltfunctions.com/xsl/
2. http://stackoverflow.com/ is a website where users can ask questions related to XSLT.
3. http://www.w3schools.com/xsl/
4. XSLT Cookbook is a good book to refer for XSLT coding.

No comments:

Post a Comment