10. XPath Cheat Sheet
HTML is treated as a tree of nodes. The topmost element of the tree is called the root element. XPath is used to navigate an HTML document and to query for specific elements (elements, nodes, widgets, and tags are used interchangeability in this document). There are 7 node types in XPath, but only 3 of them are of interest to this document: element, attribute, and text.
<html> Â Â Â Â Â <head> Â Â Â Â Â Â Â Â Â Â Â <title>website title</title> Â Â Â Â Â </head> Â Â Â Â Â <body> Â Â Â Â Â Â Â Â Â Â Â <table color="white"> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <tr number="first"> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <th>First Name</th> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <th>Last Name</th> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <th>Age</th> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â </tr> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <tr number="second"> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <td>Jill</td> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <td>Smith</td> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <td>50</td> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â </tr> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <tr number="third"> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <td>Dave</td> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <td>Fisher</td> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â <td>32</td> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â </tr> Â Â Â Â Â Â Â Â Â Â Â </table> Â Â Â Â Â </body> </html>
<html>: Root element node <body>, <tr>, <table>: element nodes color="white": attribute First Name, Jill, 32: text
Relationship between nodes:
Parent: Each element or node has one parent with the exception of the root node. In the HTML above, html is the parent of head and body.
Children: Each element may have zero, one, or more children. In the HTML above, each td has zero child, head has one child called title, each tr has 3 td children.
Siblings: Children with the same parent are called siblings. In the above example, head and body are siblings. Also, all tr nodes are siblings as well.
Ancestors: A node’s parent, parent of parent, etc. are called ancestors. In the above example:
<td>Dave</td> has the ancestors of <tr number="third">, <table color="white">, <body>, <html>.
Descendants: A node’s children, children’s children, etc. are called descendants. In the above example:Â
<td>Dave</td>, <tr number="third">, <table color="white">, <body> are all descendants of <html>.
Most frequently used XPath expressions and their meanings are included in the following table:
XPath | Description |
---|---|
/ | ·     If XPath starts with it: selects from the root node. Used in absolute XPath’s ·     If used in the middle of XPath: selects the child node from the current node |
// | ·     If XPath starts with it: selects anywhere in the HTML. Used in relative XPath’s ·     If used in the middle of XPath: selects the descendants of the current node |
. | Selects current node |
.. | Selects the parent of current node |
@ | Selects attributes |
Here are some examples of the above expressions:
XPath | Description |
---|---|
/html | Selects the root element html |
//tr | Selects all the tr elements |
//tr[@number=’first’] | Selects the first tr in the above HTML because it has the attribute number with the value first. |
//tr[@number=’first’]/.. | Selects the table node. Basically, it first finds the first tr element and then go to its immediate parent |
//table//td | Selects all the td elements. It first finds the table node and then finds all of its descendant td nodes |
If there are multiple of a node, you can use numbers in brackets to refer to a specific one. You can find some examples below:
XPath | Description |
(//table/tr)[1] | Selects the first row (i.e. tr) of the table |
(//table/tr)[last()] | Selects the last row of the table |
The following wildcards are available in XPath:
Wildcard | Description |
---|---|
* | Matches any node |
@* | Matches any attribute node |
Node() | Matches any node of any kind |
Here are some examples of the above wildcards:
Wildcard | Description |
//table/* | Selects the children of the table element |
//tr[@*] | Selects all the tr nodes with at least one attribute |
Finding a Node Relative to Another Node
There are times when a node does not have an id or name or if it does, it is dynamically generated. Elements in a table is a good example of this scenario. In these instances, the best way to find the element is to come up with an XPath based on some other element on the page that is constant or has id or name. To do these relative XPath’s, you will need to know about the following terms:
Axis Name | Description |
---|---|
ancestor | Selects all the ancestors of the current node |
ancestor-or-self | Select all the ancestors of the current node plus the current node itself |
attribute | Selects all the attributes of the current node |
child | Selects all children of the current node |
descendant | Selects all descendants of the current node |
following-sibling | Selects all siblings after the current node |
Parent | Selects the parent of the current node |
preceding-sibling | Selects all siblings before the current node |
Some examples of Axis are outlined below:
XPath | Description |
---|---|
//td[contains(text(), ‘Smith’]/preceding-sibling::td[1] | Selects the first td that includes the text Jill based on the sibling td that has the text Smith |
//td[contains(text(), ‘Smith’]/ancestor::table[1] | Selects the first table node based on the td that has the text Smith |
Some XPath best practices:
- Start all of your XPath’s with //. This says to find an element anywhere in the HTML page regardless of where it is located and it is not dependent on other elements on the page
- Use ‘contains()’ instead of ‘=’ when searching for a text inside an element. The reason is that ‘=’ will fail if there are white spaces around the text you are looking for. Examples are:
- To find a td with text including Jill for example <td>Jill</td>, use this XPath: //td[contains(text(), ‘Jill’)]Â
- To find a link with text including MyText for example <a>MyText<a>, use this XPath: //a[contains(text(), ‘MyText’)]
- Do not use ‘following’ or ‘preceding’ by themselves as they do not work well with IE. following-sibling or preceding-sibling are fine to use.
More XPath examples:
XPath | Description |
---|---|
//input[@value='Continue' Â and @name='btnContinue'] | Select element of type input with attribute value matching 'Continue' and attribute name matching btnContinue |
//*[@id='someId'] | Select any html element with id equals to someId. Note that the id is case sensitive and is a whole match. |
//a[@href='someUrl'] | Select a link (Html anchor element) with attribute href matching 'someUrl'. Case sensitive |
//a[text()='Exact Match Case sensitive'] | Select a link (Html anchor element) with attribute visible text matching exactly 'Exact Match Case sensitive'. |
//a[contains(text(),'Partial Match Visible Text')] | Select a link (Html anchor element) with visible text containing text 'Partial Match Visible Text'. |
//a[@title='Some Title text'] | Select a link (Html anchor element) with attribute title matching 'Some Title text'. Case sensitive |
//*[contains(text(),'$SOME_DYNAMIC_PARAM')] /following::a[contains(text(),'Relative Link to previous link')] | Relative match. This is for dynamic elements where you locate the dynamic element and relative to that element you select the element you want to interact with. |
(//a[@href='Login.asp'])[1] | Find the first link that has the attribute href='Login.asp'. For second match use [2], third [3] and for [last() -1] for the one before the last. |
//table[@id='myTable']/tr[last()] | Select the last row of the table with id='myTable' |
//a[contains(@title,'bob')] | Find a link that contains 'bob' in its title attribute. |
//a[starts-with(@title,'bob')] | Find the link that has the title that starts with 'bob'. |