Friday, April 4, 2014

Extracting Data from XML files using XSL and XPath...3

Usage of XSLT along with examples

The coding that has been done for Release Testing metrics contains the following codes.

XML Structure

You must understand the structure of XML very well. Unless that is done the code will not work and writing code could be an unnecessary struggle. Some XML files can be really complex, so please take time to understand the structure. That’s why ‘Notepad++’ is highly recommended.

Declaration of XML Version

The beginning

All XSLT code must start with <?xml version> tag. Note that we can also have some comments that start with <!-- and end with -->. Everything within <!--and --> will be ignored by the XML Starlet interpreter.

The XML Namespace and the Header of csv file

Top of the csv file

The XML namespace declaration
is only used as a unique string to identify the namespace for XSLT. If we do not provide this exact string as the namespace URI for the xsl namespace prefix, the XSLT processor will simply ignore <xsl:template>, <xsl:for-each>, <xsl:value-of>, and other elements with the xsl prefix since it will not recognize them as XSLT actions.
Theheader is a necessary part of XSLT coding if you are getting output in csv format. The template match tag actually defines the first row here. The first row of the csv file will contain column names - Datasource,Repository Id,Repository Name,Test Plan ID etc.

Conditions for the output csv file – ‘xsl:for-each’ tag

The conditions will depend upon the XML file structure and the requirements that define the output. The XML file that we used had the structure in the following hierarchical format:
<datasource>
<!--- datasource details -->
<repository>
<!---repository details -->
<testplan>
<!--- ‘testplan’ details (custom attributes at test plan level) -->
<build>
<!-- build details -->
<platform>
<!-- platform details -->
<testcase>
<!—test case level custom attributes -->
</testcase>
</platform>
</build>
</testplan>
</repository>
</datasource>

The example XML can be seen here:

<?xml version="1.0" encoding="UTF-8"?>
<View>
  <view_name>TLTestExecutions</view_name>
  <results>
    <datasource>
      <datasource_id>19</datasource_id>
      <datasource_description>Testlink bpd</datasource_description>
      <repository>
        <repository_name><![CDATA[DC8]]></repository_name>
        <testplan>
          <testplan_name><![CDATA[2013-R3-Dec: R19358: DEV/INT]]></testplan_name>
          <testplan_custom>
            <testplan_custom_name><![CDATA[tp_phase]]></testplan_custom_name>
            <testplan_custom_label><![CDATA[Phase]]></testplan_custom_label>
            <testplan_custom_value><![CDATA[2013-R3-Dec]]></testplan_custom_value>
          </testplan_custom>
          <testplan_custom>
            <testplan_custom_name><![CDATA[tp_sub_phase]]></testplan_custom_name>
            <testplan_custom_label><![CDATA[Business Process]]></testplan_custom_label>
            <testplan_custom_value><![CDATA[Payments]]></testplan_custom_value>
          </testplan_custom>
          <testplan_custom>
            <testplan_custom_name><![CDATA[tp_test_type]]></testplan_custom_name>
            <testplan_custom_label><![CDATA[Test Type]]></testplan_custom_label>
            <testplan_custom_value><![CDATA[Integration]]></testplan_custom_value>
          </testplan_custom>
          <testplan_custom>
            <testplan_custom_name><![CDATA[tp_trac_field_workeffort_origin]]></testplan_custom_name>
            <testplan_custom_label><![CDATA[TP Work Effort Originating]]></testplan_custom_label>
            <testplan_custom_value><![CDATA[R19358]]></testplan_custom_value>
          </testplan_custom>
          <testplan_custom>
            <testplan_custom_name><![CDATA[tp_trac_field_workeffort_owning]]></testplan_custom_name>
            <testplan_custom_label><![CDATA[TP Work Effort Owning]]></testplan_custom_label>
            <testplan_custom_value><![CDATA[R19358]]></testplan_custom_value>
          </testplan_custom>
          <testplan_custom>
            <testplan_custom_name><![CDATA[tp_trac_url]]></testplan_custom_name>
            <testplan_custom_label><![CDATA[TP Trac URL]]></testplan_custom_label>
            <testplan_custom_value><![CDATA[https://lt00h0000000004.opr.statefarm.org/trac/bpd/]]></testplan_custom_value>
          </testplan_custom>
          <build>
            <build_name><![CDATA[Build 2 May 22]]></build_name>
            <platform>
              <platform_name><![CDATA[Batch 1]]></platform_name>
              <testcase>
                <testcase_name><![CDATA[GL - Auto and Fire Side]]></testcase_name>
                <tcversion>
                  <tcversion_id>382630</tcversion_id>
                  <tcversion_version>1</tcversion_version>
                  <tcversion_executiontype>Manual</tcversion_executiontype>
                  <tcexecution>
                    <tcexecution_status>p</tcexecution_status>
                  </tcexecution>
                </tcversion>
              </testcase>
              <testcase>
                <testcase_name><![CDATA[Internal Cash Batch - Day 1 - For US & Canada]]></testcase_name>
                <tcversion>
                  <tcversion_executiontype>Manual</tcversion_executiontype>
                  <tcexecution>
                    <tcexecution_id>9358</tcexecution_id>
                    <tcexecution_status>p</tcexecution_status>
                  </tcexecution>
                </tcversion>
              </testcase>
              <testcase>
                <testcase_name><![CDATA[RD0ZA410 Report Validation - - US & Canada]]></testcase_name>
                <tcversion>
                  <tcversion_executiontype>Manual</tcversion_executiontype>
                  <tcexecution>
                    <tcexecution_status>d</tcexecution_status>
                  </tcexecution>
                </tcversion>
              </testcase>
</build>
</testplan>
</repository>
</datasource>
</results>

</View>

So, one can understand that a datasource can contain several repositories, which can contain several testplans and so on. 

For Phase 0 we needed the counts at the level of test plans depending on several conditions. So, the condition went like this:



The xsl:for-each command reads a tag (‘testplan’ in this case) and applies the condition specified within the square brackets to that tag. The above expression:

‘//repository/testplan’

is actually an XPath expression. XPath defines the path to traverse within an XML to reach a node (to perform any action on that node). It indicates the hierarchy (or navigation) of the XML file at hand.

Notice the ‘//’. This actually takes the control directly to ‘testplan’ tag. You can actually use ‘//’ only to accomplish the above. Hence, for the above code snippet ‘//repository/testplan’is same as ‘//testplan’. The conditions will depend on the business requirement of the report. The‘testplan’ contains testplan_custom, which contains testplan_custom_name and testplan_custom_value. Hence, the condition for ‘tp_phase’ custom attribute when its value = ‘Phase 0’. Substring function actually checks whether the testplan_custom_value has ‘ICP_Ph0.’ as its first (indicated by 1) 8 characters. Read the below statement as:
substring(testplan_custom_value, 1,8)
get me 8 characters from ‘testplan_custom_value’ starting from position 1 (first character). Here ‘testplan_custom_value’ is just an XPath expression. It can also be a string or a variable. It does not have a ‘.’ or ‘/’ etc. because we are doing the operation within testplan/testplan_custom.

The other condition is on the ‘tp_sub_phase’ which must also be true. The last condition is ‘../../datasource_description != 'Testlink Internet'’. This was because one of the groups(denoted by a datasource) wanted to have an exception. Notice the construct – ‘../../’. ‘../’ construct means accessing the parent tag. In the current for-each we are at the ‘testplan’ level, but we need to check a condition at the datasource level. As it is visible from the XML structure, datasource is 2 levels up the ‘testplan’. Hence, we needed ‘../../’to access datasource_description, which is a tag within datasource. When all the conditions (separated by ‘and’ & ‘or’) will be true, the data within that ‘testplan’ tag will be read.

Getting the values –‘xsl:value-of’ tag

When the above conditions are true, we can read the contents within the ‘testplan’ tag. Actually, not just the values within the ‘testplan’ tag, but also above it (repository and datasource). How? By using the ‘../’ construct, as it is shown below:



‘../../datasource_description’ and ‘../repository_id’ are XPath expressions that evaluate to a value (within an XML tag). Notice the construct – ‘<xsl:value-of select="','"/>’; it is used to insert a ‘,’ between two respective values. The comma is contained in ‘’ because it is a string. So, datasource_description is followed by a comma (,) and that is followed by repository_id. Note that the XPath expressions need not be in quotes since they are not strings, even though they might evaluate to strings.

Remember that the output file is a csv file. What if we have a repository name containing a comma? In that case, the columns will shift towards right when the csv file will be opened in Microsoft Excel. There is a workaround to avoid and eliminate commas in evaluated XPath expressions, if they are present in a value. It will be discussed under “Tackling the comma ‘,’” section.

Getting the values – conditional ‘xsl:value-of’ tag

The tag works the same was as above, except that we can put certain conditions using square brackets.

The XML structure is as follows:



In the value-of tag we are checking that if the ‘testplan_custom_name’ is ‘tp_test_type’ output the corresponding ‘testplan_custom_value’. Notice the usage of brackets. The ‘./’ construct is used before ‘testplan_custom’ to indicate that we are looking at the current level of ‘for-each’ – ‘testplan’ tag. How to get just the ‘testplan_id’? Simple, <xsl:value-of select="./testplan_id"/> will work for that.

No comments:

Post a Comment