Class DomNode

java.lang.Object
org.htmlunit.html.DomNode
All Implemented Interfaces:
Serializable, Cloneable, Node
Direct Known Subclasses:
DomCharacterData, DomDocumentFragment, DomDocumentType, DomNamespaceNode, DomProcessingInstruction, SgmlPage

public abstract class DomNode extends Object implements Cloneable, Serializable, Node
Base class for nodes in the HTML DOM tree. This class is modeled after the W3C DOM specification, but does not implement it.
See Also:
  • Field Details

    • READY_STATE_UNINITIALIZED

      public static final String READY_STATE_UNINITIALIZED
      A ready state constant (state 1).
      See Also:
    • READY_STATE_LOADING

      public static final String READY_STATE_LOADING
      A ready state constant (state 2).
      See Also:
    • READY_STATE_LOADED

      public static final String READY_STATE_LOADED
      A ready state constant (state 3).
      See Also:
    • READY_STATE_INTERACTIVE

      public static final String READY_STATE_INTERACTIVE
      A ready state constant (state 4).
      See Also:
    • READY_STATE_COMPLETE

      public static final String READY_STATE_COMPLETE
      A ready state constant (state 5).
      See Also:
    • PROPERTY_ELEMENT

      public static final String PROPERTY_ELEMENT
      The name of the "element" property. Used when watching property change events.
      See Also:
  • Constructor Details

    • DomNode

      protected DomNode(SgmlPage page)
      Creates a new instance.
      Parameters:
      page - the page which contains this node
  • Method Details

    • setStartLocation

      public void setStartLocation(int startLineNumber, int startColumnNumber)
      Sets the line and column numbers in the source page where the DOM node starts.
      Parameters:
      startLineNumber - the line number where the DOM node starts
      startColumnNumber - the column number where the DOM node starts
    • setEndLocation

      public void setEndLocation(int endLineNumber, int endColumnNumber)
      Sets the line and column numbers in the source page where the DOM node ends.
      Parameters:
      endLineNumber - the line number where the DOM node ends
      endColumnNumber - the column number where the DOM node ends
    • getStartLineNumber

      public int getStartLineNumber()
      Returns the line number in the source page where the DOM node starts.
      Returns:
      the line number in the source page where the DOM node starts
    • getStartColumnNumber

      public int getStartColumnNumber()
      Returns the column number in the source page where the DOM node starts.
      Returns:
      the column number in the source page where the DOM node starts
    • getEndLineNumber

      public int getEndLineNumber()
      Returns the line number in the source page where the DOM node ends.
      Returns:
      0 if no information on the line number is available (for instance for nodes dynamically added), -1 if the end tag has not yet been parsed (during page loading)
    • getEndColumnNumber

      public int getEndColumnNumber()
      Returns the column number in the source page where the DOM node ends.
      Returns:
      0 if no information on the line number is available (for instance for nodes dynamically added), -1 if the end tag has not yet been parsed (during page loading)
    • getPage

      public SgmlPage getPage()
      Returns the page that contains this node.
      Returns:
      the page that contains this node
    • getHtmlPageOrNull

      public HtmlPage getHtmlPageOrNull()
      Returns the page that contains this node.
      Returns:
      the page that contains this node
    • getOwnerDocument

      public Document getOwnerDocument()
      Specified by:
      getOwnerDocument in interface Node
    • setScriptableObject

      public void setScriptableObject(org.htmlunit.javascript.HtmlUnitScriptable scriptObject)
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
      Sets the JavaScript object that corresponds to this node. This is not guaranteed to be set even if there is a JavaScript object for this DOM node.
      Parameters:
      scriptObject - the JavaScript object
    • getLastChild

      public DomNode getLastChild()
      Specified by:
      getLastChild in interface Node
    • getParentNode

      public DomNode getParentNode()
      Specified by:
      getParentNode in interface Node
    • setParentNode

      protected void setParentNode(DomNode parent)
      Sets the parent node.
      Parameters:
      parent - the parent node
    • getIndex

      public int getIndex()
      Returns this node's index within its parent's child nodes (zero-based).
      Returns:
      this node's index within its parent's child nodes (zero-based)
    • getPreviousSibling

      public DomNode getPreviousSibling()
      Specified by:
      getPreviousSibling in interface Node
    • getNextSibling

      public DomNode getNextSibling()
      Specified by:
      getNextSibling in interface Node
    • getFirstChild

      public DomNode getFirstChild()
      Specified by:
      getFirstChild in interface Node
    • isAncestorOf

      public boolean isAncestorOf(DomNode node)
      Returns true if this node is an ancestor of the specified node.
      Parameters:
      node - the node to check
      Returns:
      true if this node is an ancestor of the specified node
    • isAncestorOfAny

      public boolean isAncestorOfAny(DomNode... nodes)
      Returns true if this node is an ancestor of the specified nodes.
      Parameters:
      nodes - the nodes to check
      Returns:
      true if this node is an ancestor of the specified nodes
    • getNamespaceURI

      public String getNamespaceURI()
      Specified by:
      getNamespaceURI in interface Node
    • getLocalName

      public String getLocalName()
      Specified by:
      getLocalName in interface Node
    • getPrefix

      public String getPrefix()
      Specified by:
      getPrefix in interface Node
    • hasChildNodes

      public boolean hasChildNodes()
      Specified by:
      hasChildNodes in interface Node
    • getChildNodes

      public DomNodeList<DomNode> getChildNodes()
      Specified by:
      getChildNodes in interface Node
    • isSupported

      public boolean isSupported(String namespace, String featureName)
      Not yet implemented.
      Specified by:
      isSupported in interface Node
    • normalize

      public void normalize()
      Specified by:
      normalize in interface Node
    • getBaseURI

      public String getBaseURI()
      Specified by:
      getBaseURI in interface Node
    • compareDocumentPosition

      public short compareDocumentPosition(Node other)
      Specified by:
      compareDocumentPosition in interface Node
    • getAncestors

      public List<Node> getAncestors()
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
      Gets the ancestors of the node.
      Returns:
      a list of the ancestors with the root at the first position
    • getTextContent

      public String getTextContent()
      Specified by:
      getTextContent in interface Node
    • setTextContent

      public void setTextContent(String textContent)
      Specified by:
      setTextContent in interface Node
    • isSameNode

      public boolean isSameNode(Node other)
      Specified by:
      isSameNode in interface Node
    • lookupPrefix

      public String lookupPrefix(String namespaceURI)
      Not yet implemented.
      Specified by:
      lookupPrefix in interface Node
    • isDefaultNamespace

      public boolean isDefaultNamespace(String namespaceURI)
      Not yet implemented.
      Specified by:
      isDefaultNamespace in interface Node
    • lookupNamespaceURI

      public String lookupNamespaceURI(String prefix)
      Not yet implemented.
      Specified by:
      lookupNamespaceURI in interface Node
    • isEqualNode

      public boolean isEqualNode(Node arg)
      Not yet implemented.
      Specified by:
      isEqualNode in interface Node
    • getFeature

      public Object getFeature(String feature, String version)
      Not yet implemented.
      Specified by:
      getFeature in interface Node
    • getUserData

      public Object getUserData(String key)
      Specified by:
      getUserData in interface Node
    • setUserData

      public Object setUserData(String key, Object data, UserDataHandler handler)
      Specified by:
      setUserData in interface Node
    • hasAttributes

      public boolean hasAttributes()
      Specified by:
      hasAttributes in interface Node
    • getAttributes

      public NamedNodeMap getAttributes()
      Specified by:
      getAttributes in interface Node
    • isDisplayed

      public boolean isDisplayed()

      Returns true if this node is displayed and can be visible to the user (ignoring screen size, scrolling limitations, color, font-size, or overlapping nodes).

      NOTE: If CSS is disabled, this method does not take this element's style into consideration!

      Returns:
      true if the node is visible to the user, false otherwise
      See Also:
    • mayBeDisplayed

      public boolean mayBeDisplayed()
      Returns true if nodes of this type can ever be displayed, false otherwise. Examples of nodes that can never be displayed are <head>, <meta>, <script>, etc.
      Returns:
      true if nodes of this type can ever be displayed, false otherwise
      See Also:
    • asNormalizedText

      public String asNormalizedText()
      Returns a normalized textual representation of this element that represents what would be visible to the user if this page was shown in a web browser. Whitespace is normalized like in the browser and block tags are separated by '\n'.
      Returns:
      a normalized textual representation of this element
    • getVisibleText

      public String getVisibleText()
      Returns a textual representation of this element in the same way as the selenium/WebDriver WebElement#getText() property does.
      see get-element-text and dfn-bot-dom-getvisibletext Note: this is different from asNormalizedText()
      Returns:
      a textual representation of this element that represents what would be visible to the user if this page was shown in a web browser
    • asXml

      public String asXml()
      Returns a string representation as XML document from this element and all it's children (recursively).
      The charset used in the xml header is the current page encoding; but the result is still a string. You have to make sure to use the correct (in fact the same) encoding if you write this to a file.
      This serializes the current state of the DomTree - this implies that the content of noscript tags usually serialized as string because the content is converted during parsing (if js was enabled at that time).
      Returns:
      the XML string
    • printXml

      protected void printXml(String indent, PrintWriter printWriter)
      Recursively writes the XML data for the node tree starting at node.
      Parameters:
      indent - white space to indent child nodes
      printWriter - writer where child nodes are written
    • printChildrenAsXml

      protected void printChildrenAsXml(String indent, PrintWriter printWriter)
      Recursively writes the XML data for the node tree starting at node.
      Parameters:
      indent - white space to indent child nodes
      printWriter - writer where child nodes are written
    • getNodeValue

      public String getNodeValue()
      Specified by:
      getNodeValue in interface Node
    • cloneNode

      public DomNode cloneNode(boolean deep)
      Specified by:
      cloneNode in interface Node
    • getScriptableObject

      public <T extends org.htmlunit.javascript.HtmlUnitScriptable> T getScriptableObject()
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.

      Returns the JavaScript object that corresponds to this node, lazily initializing a new one if necessary.

      The logic of when and where the JavaScript object is created needs a clean up: functions using a DOM node's JavaScript object should not have to check if they should create it first.

      Type Parameters:
      T - the object type
      Returns:
      the JavaScript object that corresponds to this node
    • appendChild

      public DomNode appendChild(Node node)
      Specified by:
      appendChild in interface Node
    • insertBefore

      public Node insertBefore(Node newChild, Node refChild)
      Specified by:
      insertBefore in interface Node
    • insertBefore

      public void insertBefore(DomNode newNode)
      Inserts the specified node as a new child node before this node into the child relationship this node is a part of. If the specified node is this node, this method is a no-op.
      Parameters:
      newNode - the new node to insert
    • removeChild

      public Node removeChild(Node child)
      Specified by:
      removeChild in interface Node
    • removeAllChildren

      public void removeAllChildren()
      Removes all of this node's children.
    • parseHtmlSnippet

      public void parseHtmlSnippet(String source) throws SAXException, IOException
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
      Parses the specified HTML source code, appending the resulting content at the specified target location.
      Parameters:
      source - the HTML code extract to parse
      Throws:
      IOException - in case of error
      SAXException - in case of error
    • remove

      public void remove()
      Removes this node from all relationships with other nodes.
    • detach

      protected void detach()
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
      Detach this node from all relationships with other nodes. This is the first step of a move.
    • basicRemove

      protected void basicRemove()
      Cuts off all relationships this node has with siblings and parents.
    • replaceChild

      public Node replaceChild(Node newChild, Node oldChild)
      Specified by:
      replaceChild in interface Node
    • replace

      public void replace(DomNode newNode)
      Replaces this node with another node. If the specified node is this node, this method is a no-op.
      Parameters:
      newNode - the node to replace this one
    • quietlyRemoveAndMoveChildrenTo

      public void quietlyRemoveAndMoveChildrenTo(DomNode destination)
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
      Quietly removes this node and moves its children to the specified destination. "Quietly" means that no node events are fired. This method is not appropriate for most use cases. It should only be used in specific cases for HTML parsing hackery.
      Parameters:
      destination - the node to which this node's children should be moved before this node is removed
    • checkChildHierarchy

      protected void checkChildHierarchy(Node newChild) throws DOMException
      Check for insertion errors for a new child node. This is overridden by derived classes to enforce which types of children are allowed.
      Parameters:
      newChild - the new child node that is being inserted below this node
      Throws:
      DOMException - HIERARCHY_REQUEST_ERR: Raised if this node is of a type that does not allow children of the type of the newChild node, or if the node to insert is one of this node's ancestors or this node itself, or if this node is of type Document and the DOM application attempts to insert a second DocumentType or Element node. WRONG_DOCUMENT_ERR: Raised if newChild was created from a different document than the one that created this node.
    • onAddedToPage

      protected void onAddedToPage()
      Lifecycle method invoked whenever a node is added to a page. Intended to be overridden by nodes which need to perform custom logic when they are added to a page. This method is recursive, so if you override it, please be sure to call super.onAddedToPage().
    • onAllChildrenAddedToPage

      public void onAllChildrenAddedToPage(boolean postponed)
      Lifecycle method invoked after a node and all its children have been added to a page, during parsing of the HTML. Intended to be overridden by nodes which need to perform custom logic after they and all their child nodes have been processed by the HTML parser. This method is not recursive, and the default implementation is empty, so there is no need to call super.onAllChildrenAddedToPage() if you implement this method.
      Parameters:
      postponed - whether to use PostponedAction or no
    • onAddedToDocumentFragment

      protected void onAddedToDocumentFragment()
      Lifecycle method invoked whenever a node is added to a document fragment. Intended to be overridden by nodes which need to perform custom logic when they are added to a fragment. This method is recursive, so if you override it, please be sure to call super.onAddedToDocumentFragment().
    • getChildren

      public final Iterable<DomNode> getChildren()
      Returns:
      an Iterable over the children of this node
    • getDescendants

      public final Iterable<DomNode> getDescendants()
      Returns an Iterable that will recursively iterate over all of this node's descendants, including DomText elements, DomComment elements, etc. If you want to iterate only over HtmlElement descendants, please use getHtmlElementDescendants().
      Returns:
      an Iterable that will recursively iterate over all of this node's descendants
    • getHtmlElementDescendants

      public final Iterable<HtmlElement> getHtmlElementDescendants()
      Returns an Iterable that will recursively iterate over all of this node's HtmlElement descendants. If you want to iterate over all descendants (including DomText elements, DomComment elements, etc.), please use getDescendants().
      Returns:
      an Iterable that will recursively iterate over all of this node's HtmlElement descendants
      See Also:
    • getDomElementDescendants

      public final Iterable<DomElement> getDomElementDescendants()
      Returns an Iterable that will recursively iterate over all of this node's DomElement descendants. If you want to iterate over all descendants (including DomText elements, DomComment elements, etc.), please use getDescendants().
      Returns:
      an Iterable that will recursively iterate over all of this node's DomElement descendants
      See Also:
    • getReadyState

      public String getReadyState()
      Returns this node's ready state (IE only).
      Returns:
      this node's ready state
    • setReadyState

      public void setReadyState(String state)
      Sets this node's ready state (IE only).
      Parameters:
      state - this node's ready state
    • getByXPath

      public <T> List<T> getByXPath(String xpathExpr)
      Evaluates the specified XPath expression from this node, returning the matching elements.
      Note: This implies that the ',' point to this node but the general axis like '//' are still looking at the whole document. E.g. if you like to get all child h1 nodes from the current one you have to use './/h1' instead of '//h1' because the latter matches all h1 nodes of the# whole document.
      Type Parameters:
      T - the expected type
      Parameters:
      xpathExpr - the XPath expression to evaluate
      Returns:
      the elements which match the specified XPath expression
      See Also:
    • getByXPath

      public List<?> getByXPath(String xpathExpr, org.htmlunit.xpath.xml.utils.PrefixResolver resolver)
      Evaluates the specified XPath expression from this node, returning the matching elements.
      Parameters:
      xpathExpr - the XPath expression to evaluate
      resolver - the prefix resolver to use for resolving namespace prefixes, or null
      Returns:
      the elements which match the specified XPath expression
      See Also:
    • getFirstByXPath

      public <X> X getFirstByXPath(String xpathExpr)
      Evaluates the specified XPath expression from this node, returning the first matching element, or null if no node matches the specified XPath expression.
      Type Parameters:
      X - the expression type
      Parameters:
      xpathExpr - the XPath expression
      Returns:
      the first element matching the specified XPath expression
      See Also:
    • getFirstByXPath

      public <X> X getFirstByXPath(String xpathExpr, org.htmlunit.xpath.xml.utils.PrefixResolver resolver)
      Evaluates the specified XPath expression from this node, returning the first matching element, or null if no node matches the specified XPath expression.
      Type Parameters:
      X - the expression type
      Parameters:
      xpathExpr - the XPath expression
      resolver - the prefix resolver to use for resolving namespace prefixes, or null
      Returns:
      the first element matching the specified XPath expression
      See Also:
    • getCanonicalXPath

      public String getCanonicalXPath()

      Returns the canonical XPath expression which identifies this node, for instance "/html/body/table[3]/tbody/tr[5]/td[2]/span/a[3]".

      WARNING: This sort of automated XPath expression is often quite bad at identifying a node, as it is highly sensitive to changes in the DOM tree.

      Returns:
      the canonical XPath expression which identifies this node
      See Also:
    • notifyIncorrectness

      protected void notifyIncorrectness(String message)
      Notifies the registered IncorrectnessListener of something that is not fully correct.
      Parameters:
      message - the notification to send to the registered IncorrectnessListener
    • addDomChangeListener

      public void addDomChangeListener(DomChangeListener listener)
      Adds a DomChangeListener to the listener list. The listener is registered for all descendants of this node.
      Parameters:
      listener - the DOM structure change listener to be added
      See Also:
    • removeDomChangeListener

      public void removeDomChangeListener(DomChangeListener listener)
      Removes a DomChangeListener from the listener list. The listener is deregistered for all descendants of this node.
      Parameters:
      listener - the DOM structure change listener to be removed
      See Also:
    • fireNodeAdded

      protected void fireNodeAdded(DomNode parentNode, DomNode addedNode)
      Support for reporting DOM changes. This method can be called when a node has been added, and it will send the appropriate DomChangeEvent to any registered DomChangeListeners.

      Note that this method recursively calls this node's parent's fireNodeAdded(DomNode, DomNode).

      Parameters:
      parentNode - the parent of the node that was changed
      addedNode - the node that has been added
    • addCharacterDataChangeListener

      public void addCharacterDataChangeListener(CharacterDataChangeListener listener)
      Adds a CharacterDataChangeListener to the listener list. The listener is registered for all descendants of this node.
      Parameters:
      listener - the character data change listener to be added
      See Also:
    • removeCharacterDataChangeListener

      public void removeCharacterDataChangeListener(CharacterDataChangeListener listener)
      Removes a CharacterDataChangeListener from the listener list. The listener is deregistered for all descendants of this node.
      Parameters:
      listener - the Character Data change listener to be removed
      See Also:
    • fireCharacterDataChanged

      protected void fireCharacterDataChanged(DomCharacterData characterData, String oldValue)
      Support for reporting Character Data changes.

      Note that this method recursively calls this node's parent's fireCharacterDataChanged(org.htmlunit.html.DomCharacterData, java.lang.String).

      Parameters:
      characterData - the character data which is changed
      oldValue - the old value
    • fireNodeDeleted

      protected void fireNodeDeleted(DomNode parentNode, DomNode deletedNode)
      Support for reporting DOM changes. This method can be called when a node has been deleted, and it will send the appropriate DomChangeEvent to any registered DomChangeListeners.

      Note that this method recursively calls this node's parent's fireNodeDeleted(DomNode, DomNode).

      Parameters:
      parentNode - the parent of the node that was changed
      deletedNode - the node that has been deleted
    • querySelectorAll

      public DomNodeList<DomNode> querySelectorAll(String selectors)
      Retrieves all element nodes from descendants of the starting element node that match any selector within the supplied selector strings.
      Parameters:
      selectors - one or more CSS selectors separated by commas
      Returns:
      list of all found nodes
    • getSelectorList

      protected org.htmlunit.cssparser.parser.selector.SelectorList getSelectorList(String selectors, WebClient webClient) throws IOException
      Returns the SelectorList.
      Parameters:
      selectors - the selectors
      webClient - the WebClient
      Returns:
      the SelectorList
      Throws:
      IOException - if an error occurs
    • querySelector

      public <N extends DomNode> N querySelector(String selectors)
      Returns the first element within the document that matches the specified group of selectors.
      Type Parameters:
      N - the node type
      Parameters:
      selectors - one or more CSS selectors separated by commas
      Returns:
      null if no matches are found; otherwise, it returns the first matching element
    • isAttachedToPage

      public boolean isAttachedToPage()
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
      Indicates if this node is currently attached to the page.
      Returns:
      true if the page is one ancestor of the node.
    • processImportNode

      public void processImportNode(org.htmlunit.javascript.host.dom.Document doc)
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
      Lifecycle method to support special processing for js method importNode.
      Parameters:
      doc - the import target document
      See Also:
    • hasFeature

      public boolean hasFeature(BrowserVersionFeatures feature)
      INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
      Helper for a common call sequence.
      Parameters:
      feature - the feature to check
      Returns:
      true if the currently emulated browser has this feature.
    • handles

      public boolean handles(org.htmlunit.javascript.host.event.Event event)
      Indicates if the provided event can be applied to this node. Overwrite this.
      Parameters:
      event - the event
      Returns:
      false if the event can't be applied
    • getPreviousElementSibling

      public DomElement getPreviousElementSibling()
      Returns the previous sibling element node of this element. null if this element has no element sibling nodes that come before this one in the document tree.
      Returns:
      the previous sibling element node of this element. null if this element has no element sibling nodes that come before this one in the document tree
    • getNextElementSibling

      public DomElement getNextElementSibling()
      Returns the next sibling element node of this element. null if this element has no element sibling nodes that come after this one in the document tree.
      Returns:
      the next sibling element node of this element. null if this element has no element sibling nodes that come after this one in the document tree
    • closest

      public DomElement closest(String selectorString)
      Parameters:
      selectorString - the selector to test
      Returns:
      the selected DomElement or null.