REXML Tutorial¶ ↑
Why REXML?¶ ↑
-
Ruby’s REXML library is part of the Ruby distribution, so using it requires no gem installations.
-
REXML is fully maintained.
-
REXML is mature, having been in use for long years.
To Include, or Not to Include?¶ ↑
REXML
is a module. To use it, you must require it:
require 'rexml' # => true
If you do not also include it, you must fully qualify references to REXML:
REXML::Document # => REXML::Document
If you also include the module, you may optionally omit REXML::
:
include REXML Document # => REXML::Document REXML::Document # => REXML::Document
Preliminaries¶ ↑
All examples here assume that the following code has been executed:
require 'rexml' include REXML
The source XML for many examples here is from file books.xml at w3schools.com. You may find it convenient to open that page in a new tab (Ctrl-click in some browsers).
Note that your browser may display the XML with modified whitespace and without the XML declaration, which in this case is:
<?xml version="1.0" encoding="UTF-8"?>
For convenience, we capture the XML into a string variable:
require 'open-uri' source_string = URI.open('https://www.w3schools.com/xml/books.xml').read
And into a file:
File.write('source_file.xml', source_string)
Throughout these examples, variable doc
will hold only the document derived from these sources:
doc = Document.new(source_string)
Parsing XML Source¶ ↑
Parsing a Document¶ ↑
Use method REXML::Document::new
to parse XML source.
The source may be a string:
doc = Document.new(source_string)
Or an IO stream:
doc = File.open('source_file.xml', 'r') do |io| Document.new(io) end
Method URI.open
returns a StringIO object, so the source can be from a web page:
require 'open-uri' io = URI.open("https://www.w3schools.com/xml/books.xml") io.class # => StringIO doc = Document.new(io)
For any of these sources, the returned object is an REXML::Document
:
doc # => <UNDEFINED> ... </> doc.class # => REXML::Document
Note: 'UNDEFINED'
is the “name” displayed for a document, even though doc.name
returns an empty string ""
.
A parsed document may produce REXML objects of many classes, but the two that are likely to be of greatest interest are REXML::Document
and REXML::Element
. These two classes are covered in great detail in this tutorial.
Context (Parsing Options)¶ ↑
The context for parsing a document is a hash that influences the way the XML is read and stored.
The context entries are:
-
:respect_whitespace
: controls treatment of whitespace. -
:compress_whitespace
: determines whether whitespace is compressed. -
:ignore_whitespace_nodes
: determines whether whitespace-only nodes are to be ignored. -
:raw
: controls treatment of special characters and entities.
See Element Context.
Exploring the Document¶ ↑
An REXML::Document
object represents an XML document.
The object inherits from its ancestor classes:
-
REXML::Child
(includes moduleREXML::Node
)-
REXML::Parent
(includes module Enumerable).-
REXML::Element
(includes moduleREXML::Namespace
).
-
-
This section covers only those properties and methods that are unique to a document (that is, not inherited or included).
Document Properties¶ ↑
A document has several properties (other than its children);
-
Document type.
-
Node type.
-
Name.
-
Document.
-
XPath
- Document Type
-
A document may have a document type:
my_xml = '<!DOCTYPE foo>' my_doc = Document.new(my_xml) doc_type = my_doc.doctype doc_type.class # => REXML::DocType doc_type.to_s # => "<!DOCTYPE foo>"
- Node Type
-
A document also has a node type (always
:document
):doc.node_type # => :document
- Name
-
A document has a name (always an empty string):
doc.name # => ""
- Document
-
Method
REXML::Document#document
returnsself
:doc.document == doc # => true
An object of a different class (REXML::Element or REXML::Child) may have a document, which is the document to which the object belongs; if so, that document will be an REXML::Document object.
doc.root.document.class # => REXML::Document
- XPath
-
method
REXML::Element#xpath
returns the string xpath to the element, relative to its most distant ancestor:doc.root.class # => REXML::Element doc.root.xpath # => "/bookstore" doc.root.texts.first # => "\n\n" doc.root.texts.first.xpath # => "/bookstore/text()"
If there is no ancestor, returns the expanded name of the element:
Element.new('foo').xpath # => "foo"
Document Children¶ ↑
A document may have children of these types:
-
XML declaration.
-
Root element.
-
Text.
-
Processing instructions.
-
Comments.
-
CDATA.
- XML Declaration
-
A document may an XML declaration, which is stored as an
REXML::XMLDecl
object:doc.xml_decl # => <?xml ... ?> doc.xml_decl.class # => REXML::XMLDecl Document.new('').xml_decl # => <?xml ... ?> my_xml = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>"' my_doc = Document.new(my_xml) xml_decl = my_doc.xml_decl xml_decl.to_s # => "<?xml version='1.0' encoding='UTF-8' standalone="yes"?>"
The version, encoding, and stand-alone values may be retrieved separately:
my_doc.version # => "1.0" my_doc.encoding # => "UTF-8" my_doc.stand_alone? # => "yes"
- Root Element
-
A document may have a single element child, called the root element, which is stored as an
REXML::Element
object; it may be retrieved with methodroot
:doc.root # => <bookstore> ... </> doc.root.class # => REXML::Element Document.new('').root # => nil
- Text
-
A document may have text passages, each of which is stored as an
REXML::Text
object:doc.texts.each {|t| p [t.class, t] }
Output:
[REXML::Text, "\n"]
- Processing Instructions
-
A document may have processing instructions, which are stored as
REXML::Instruction
objects:Output:
[REXML::Instruction, <?p-i my-application ...?>] [REXML::Instruction, <?p-i my-application ...?>]
- Comments
-
A document may have comments, which are stored as
REXML::Comment
objects:my_xml = <<-EOT <!--foo--> <!--bar--> EOT my_doc = Document.new(my_xml) my_doc.comments.each {|c| p [c.class, c] }
Output:
[REXML::Comment, #<REXML::Comment: @parent=<UNDEFINED> ... </>, @string="foo">] [REXML::Comment, #<REXML::Comment: @parent=<UNDEFINED> ... </>, @string="bar">]
- CDATA
-
A document may have CDATA entries, which are stored as
REXML::CData
objects:my_xml = <<-EOT <![CDATA[foo]]> <![CDATA[bar]]> EOT my_doc = Document.new(my_xml) my_doc.cdatas.each {|cd| p [cd.class, cd] }
Output:
[REXML::CData, "foo"] [REXML::CData, "bar"]
The payload of a document is a tree of nodes, descending from the root element:
doc.root.children.each do |child| p [child, child.class] end
Output:
[REXML::Text, "\n\n"] [REXML::Element, <book category='cooking'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='children'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='web'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='web' cover='paperback'> ... </>] [REXML::Text, "\n\n"]
Exploring an Element¶ ↑
An REXML::Element
object represents an XML element.
The object inherits from its ancestor classes:
-
REXML::Child
(includes moduleREXML::Node
)-
REXML::Parent
(includes module Enumerable).-
REXML::Element
(includes moduleREXML::Namespace
).
-
-
This section covers methods:
-
Defined in
REXML::Element
itself. -
Inherited from
REXML::Parent
andREXML::Child
. -
Included from
REXML::Node
.
Inside the Element¶ ↑
- Brief String Representation
-
Use method
REXML::Element#inspect
to retrieve a brief string representation.doc.root.inspect # => "<bookstore> ... </>"
The ellipsis (
...
) indicates that the element has children. When there are no children, the ellipsis is omitted:Element.new('foo').inspect # => "<foo/>"
If the element has attributes, those are also included:
doc.root.elements.first.inspect # => "<book category='cooking'> ... </>"
- Extended String Representation
-
Use inherited method
REXML::Child.bytes
to retrieve an extended string representation.doc.root.bytes # => "<bookstore>\n\n<book category='cooking'>\n <title lang='en'>Everyday Italian</title>\n <author>Giada De Laurentiis</author>\n <year>2005</year>\n <price>30.00</price>\n</book>\n\n<book category='children'>\n <title lang='en'>Harry Potter</title>\n <author>J K. Rowling</author>\n <year>2005</year>\n <price>29.99</price>\n</book>\n\n<book category='web'>\n <title lang='en'>XQuery Kick Start</title>\n <author>James McGovern</author>\n <author>Per Bothner</author>\n <author>Kurt Cagle</author>\n <author>James Linn</author>\n <author>Vaidyanathan Nagarajan</author>\n <year>2003</year>\n <price>49.99</price>\n</book>\n\n<book category='web' cover='paperback'>\n <title lang='en'>Learning XML</title>\n <author>Erik T. Ray</author>\n <year>2003</year>\n <price>39.95</price>\n</book>\n\n</bookstore>"
- Node Type
-
Use method
REXML::Element#node_type
to retrieve the node type (always:element
):doc.root.node_type # => :element
- Raw Mode
-
Use method
REXML::Element#raw
to retrieve whether (true
ornil
) raw mode is set.doc.root.raw # => nil
- Context
-
Use method
REXML::Element#context
to retrieve the context hash (see Element Context):doc.root.context # => {}
Relationships¶ ↑
An element may have:
-
Ancestors.
-
Siblings.
-
Children.
Ancestors¶ ↑
- Containing Document
-
Use method
REXML::Element#document
to retrieve the containing document, if any:ele = doc.root.elements.first # => <book category='cooking'> ... </> ele.document # => <UNDEFINED> ... </> ele = Element.new('foo') # => <foo/> ele.document # => nil
- Root Element
-
Use method
REXML::Element#root
to retrieve the root element:ele = doc.root.elements.first # => <book category='cooking'> ... </> ele.root # => <bookstore> ... </> ele = Element.new('foo') # => <foo/> ele.root # => <foo/>
- Root Node
-
Use method
REXML::Element#root_node
to retrieve the most distant ancestor, which is the containing document, if any, otherwise the root element:ele = doc.root.elements.first # => <book category='cooking'> ... </> ele.root_node # => <UNDEFINED> ... </> ele = Element.new('foo') # => <foo/> ele.root_node # => <foo/>
- Parent
-
Use inherited method
REXML::Child#parent
to retrieve the parentele = doc.root # => <bookstore> ... </> ele.parent # => <UNDEFINED> ... </> ele = doc.root.elements.first # => <book category='cooking'> ... </> ele.parent # => <bookstore> ... </>
Use included method
REXML::Node#index_in_parent
to retrieve the index of the element among all of its parents children (not just the element children). Note that while the index fordoc.root.elements[n]
is 1-based, the returned index is 0-based.doc.root.children # => # ["\n\n", # <book category='cooking'> ... </>, # "\n\n", # <book category='children'> ... </>, # "\n\n", # <book category='web'> ... </>, # "\n\n", # <book category='web' cover='paperback'> ... </>, # "\n\n"] ele = doc.root.elements[1] # => <book category='cooking'> ... </> ele.index_in_parent # => 2 ele = doc.root.elements[2] # => <book category='children'> ... </> ele.index_in_parent# => 4
Siblings¶ ↑
- Next Element
-
Use method
REXML::Element#next_element
to retrieve the first following sibling that is itself an element (nil
if there is none):ele = doc.root.elements[1] while ele do p [ele.class, ele] ele = ele.next_element end p ele
Output:
[REXML::Element, <book category='cooking'> ... </>] [REXML::Element, <book category='children'> ... </>] [REXML::Element, <book category='web'> ... </>] [REXML::Element, <book category='web' cover='paperback'> ... </>]
- Previous Element
-
Use method
REXML::Element#previous_element
to retrieve the first preceding sibling that is itself an element (nil
if there is none):ele = doc.root.elements[4] while ele do p [ele.class, ele] ele = ele.previous_element end p ele
Output:
[REXML::Element, <book category='web' cover='paperback'> ... </>] [REXML::Element, <book category='web'> ... </>] [REXML::Element, <book category='children'> ... </>] [REXML::Element, <book category='cooking'> ... </>]
- Next Node
-
Use included method
REXML::Node.next_sibling_node
(or its aliasnext_sibling
) to retrieve the first following node regardless of its class:node = doc.root.children[0] while node do p [node.class, node] node = node.next_sibling end p node
Output:
[REXML::Text, "\n\n"] [REXML::Element, <book category='cooking'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='children'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='web'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='web' cover='paperback'> ... </>] [REXML::Text, "\n\n"]
- Previous Node
-
Use included method
REXML::Node.previous_sibling_node
(or its aliasprevious_sibling
) to retrieve the first preceding node regardless of its class:node = doc.root.children[-1] while node do p [node.class, node] node = node.previous_sibling end p node
Output:
[REXML::Text, "\n\n"] [REXML::Element, <book category='web' cover='paperback'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='web'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='children'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='cooking'> ... </>] [REXML::Text, "\n\n"]
Children¶ ↑
- Child Count
-
Use inherited method
REXML::Parent.size
to retrieve the count of nodes (of all types) in the element:doc.root.size # => 9
- Child Nodes
-
Use inherited method
REXML::Parent.children
to retrieve an array of the child nodes (of all types):doc.root.children # => # ["\n\n", # <book category='cooking'> ... </>, # "\n\n", # <book category='children'> ... </>, # "\n\n", # <book category='web'> ... </>, # "\n\n", # <book category='web' cover='paperback'> ... </>, # "\n\n"]
- Child at Index
-
Use method
REXML::Element#[]
to retrieve the child at a given numerical index, ornil
if there is no such child:doc.root[0] # => "\n\n" doc.root[1] # => <book category='cooking'> ... </> doc.root[7] # => <book category='web' cover='paperback'> ... </> doc.root[8] # => "\n\n" doc.root[-1] # => "\n\n" doc.root[-2] # => <book category='web' cover='paperback'> ... </> doc.root[50] # => nil
- Index of Child
-
Use method
REXML::Parent#index
to retrieve the zero-based child index of the given object, or#size - 1
if there is no such child:ele = doc.root # => <bookstore> ... </> ele.index(ele[0]) # => 0 ele.index(ele[1]) # => 1 ele.index(ele[7]) # => 7 ele.index(ele[8]) # => 8 ele.index(ele[-1]) # => 8 ele.index(ele[-2]) # => 7 ele.index(ele[50]) # => 8
- Element Children
-
Use method
REXML::Element#has_elements?
to retrieve whether the element has element children:doc.root.has_elements? # => true REXML::Element.new('foo').has_elements? # => false
Use method
REXML::Element#elements
to retrieve theREXML::Elements
object containing the element children:eles = doc.root.elements eles # => #<REXML::Elements:0x000001ee2848e960 @element=<bookstore> ... </>> eles.size # => 4 eles.each {|e| p [e.class], e }
Output:
[<book category='cooking'> ... </>, <book category='children'> ... </>, <book category='web'> ... </>, <book category='web' cover='paperback'> ... </> ]
Note that while in this example, all the element children of the root element are elements of the same name, 'book'
, that is not true of all documents; a root element (or any other element) may have any mixture of child elements.
- CDATA Children
-
Use method
REXML::Element#cdatas
to retrieve a frozen array of CDATA children:my_xml = <<-EOT <root> <![CDATA[foo]]> <![CDATA[bar]]> </root> EOT my_doc = REXML::Document.new(my_xml) cdatas my_doc.root.cdatas cdatas.frozen? # => true cdatas.map {|cd| cd.class } # => [REXML::CData, REXML::CData]
- Comment Children
-
Use method
REXML::Element#comments
to retrieve a frozen array of comment children:my_xml = <<-EOT <root> <!--foo--> <!--bar--> </root> EOT my_doc = REXML::Document.new(my_xml) comments = my_doc.root.comments comments.frozen? # => true comments.map {|c| c.class } # => [REXML::Comment, REXML::Comment] comments.map {|c| c.to_s } # => ["foo", "bar"]
- Processing Instruction Children
-
Use method
REXML::Element#instructions
to retrieve a frozen array of processing instruction children:my_xml = <<-EOT <root> <?target0 foo?> <?target1 bar?> </root> EOT my_doc = REXML::Document.new(my_xml) instrs = my_doc.root.instructions instrs.frozen? # => true instrs.map {|i| i.class } # => [REXML::Instruction, REXML::Instruction] instrs.map {|i| i.to_s } # => ["<?target0 foo?>", "<?target1 bar?>"]
- Text Children
-
Use method
REXML::Element#has_text?
to retrieve whether the element has text children:doc.root.has_text? # => true REXML::Element.new('foo').has_text? # => false
Use method
REXML::Element#texts
to retrieve a frozen array of text children:my_xml = '<root><a/>text<b/>more<c/></root>' my_doc = REXML::Document.new(my_xml) texts = my_doc.root.texts texts.frozen? # => true texts.map {|t| t.class } # => [REXML::Text, REXML::Text] texts.map {|t| t.to_s } # => ["text", "more"]
- Parenthood
-
Use inherited method
REXML::Parent.parent?
to retrieve whether the element is a parent; always returnstrue
; onlyREXML::Child#parent
returnsfalse
.doc.root.parent? # => true
Element Attributes¶ ↑
Use method REXML::Element#has_attributes?
to return whether the element has attributes:
ele = doc.root # => <bookstore> ... </> ele.has_attributes? # => false ele = ele.elements.first # => <book category='cooking'> ... </> ele.has_attributes? # => true
Use method REXML::Element#attributes
to return the hash containing the attributes for the element. Each hash key is a string attribute name; each hash value is an REXML::Attribute
object.
ele = doc.root # => <bookstore> ... </> attrs = ele.attributes # => {} ele = ele.elements.first # => <book category='cooking'> ... </> attrs = ele.attributes # => {"category"=>category='cooking'} attrs.size # => 1 attr_name = attrs.keys.first # => "category" attr_name.class # => String attr_value = attrs.values.first # => category='cooking' attr_value.class # => REXML::Attribute
Use method REXML::Element#[]
to retrieve the string value for a given attribute, which may be given as either a string or a symbol:
ele = doc.root.elements.first # => <book category='cooking'> ... </> attr_value = ele['category'] # => "cooking" attr_value.class # => String ele['nosuch'] # => nil
Use method REXML::Element#attribute
to retrieve the value of a named attribute:
my_xml = "<root xmlns:a='a' a:x='a:x' x='x'/>" my_doc = REXML::Document.new(my_xml) my_doc.root.attribute("x") # => x='x' my_doc.root.attribute("x", "a") # => a:x='a:x'
Whitespace¶ ↑
Use method REXML::Element#ignore_whitespace_nodes
to determine whether whitespace nodes were ignored when the XML was parsed; returns true
if so, nil
otherwise.
Use method REXML::Element#whitespace
to determine whether whitespace is respected for the element; returns true
if so, false
otherwise.
Namespaces¶ ↑
Use method REXML::Element#namespace
to retrieve the string namespace URI for the element, which may derive from one of its ancestors:
xml_string = <<-EOT <root> <a xmlns='1' xmlns:y='2'> <b/> <c xmlns:z='3'/> </a> </root> EOT d = Document.new(xml_string) b = d.elements['//b'] b.namespace # => "1" b.namespace('y') # => "2" b.namespace('nosuch') # => nil
Use method REXML::Element#namespaces
to retrieve a hash of all defined namespaces in the element and its ancestors:
xml_string = <<-EOT <root> <a xmlns:x='1' xmlns:y='2'> <b/> <c xmlns:z='3'/> </a> </root> EOT d = Document.new(xml_string) d.elements['//a'].namespaces # => {"x"=>"1", "y"=>"2"} d.elements['//b'].namespaces # => {"x"=>"1", "y"=>"2"} d.elements['//c'].namespaces # => {"x"=>"1", "y"=>"2", "z"=>"3"}
Use method REXML::Element#prefixes
to retrieve an array of the string prefixes (names) of all defined namespaces in the element and its ancestors:
xml_string = <<-EOT <root> <a xmlns:x='1' xmlns:y='2'> <b/> <c xmlns:z='3'/> </a> </root> EOT d = Document.new(xml_string, {compress_whitespace: :all}) d.elements['//a'].prefixes # => ["x", "y"] d.elements['//b'].prefixes # => ["x", "y"] d.elements['//c'].prefixes # => ["x", "y", "z"]
Traversing¶ ↑
You can use certain methods to traverse children of the element. Each child that meets given criteria is yielded to the given block.
- Traverse All Children
-
Use inherited method
REXML::Parent#each
(or its alias each_child) to traverse all children of the element:doc.root.each {|child| p [child.class, child] }
Output:
[REXML::Text, "\n\n"] [REXML::Element, <book category='cooking'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='children'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='web'> ... </>] [REXML::Text, "\n\n"] [REXML::Element, <book category='web' cover='paperback'> ... </>] [REXML::Text, "\n\n"]
- Traverse Element Children
-
Use method
REXML::Element#each_element
to traverse only the element children of the element:doc.root.each_element {|e| p [e.class, e] }
Output:
[REXML::Element, <book category='cooking'> ... </>] [REXML::Element, <book category='children'> ... </>] [REXML::Element, <book category='web'> ... </>] [REXML::Element, <book category='web' cover='paperback'> ... </>]
- Traverse Element Children with Attribute
-
Use method
REXML::Element#each_element_with_attribute
with the single argumentattr_name
to traverse each element child that has the given attribute:my_doc = Document.new '<a><b id="1"/><c id="2"/><d id="1"/><e/></a>' my_doc.root.each_element_with_attribute('id') {|e| p [e.class, e] }
Output:
[REXML::Element, <b id='1'/>] [REXML::Element, <c id='2'/>] [REXML::Element, <d id='1'/>]
Use the same method with a second argument
value
to traverse each element child element that has the given attribute and value:my_doc.root.each_element_with_attribute('id', '1') {|e| p [e.class, e] }
Output:
[REXML::Element, <b id='1'/>] [REXML::Element, <d id='1'/>]
Use the same method with a third argument
max
to traverse no more than the given number of element children:my_doc.root.each_element_with_attribute('id', '1', 1) {|e| p [e.class, e] }
Output:
[REXML::Element, <b id='1'/>]
Use the same method with a fourth argument
xpath
to traverse only those element children that match the given xpath:my_doc.root.each_element_with_attribute('id', '1', 2, '//d') {|e| p [e.class, e] }
Output:
[REXML::Element, <d id='1'/>]
- Traverse Element Children with Text
-
Use method
REXML::Element#each_element_with_text
with no arguments to traverse those element children that have text:my_doc = Document.new '<a><b>b</b><c>b</c><d>d</d><e/></a>' my_doc.root.each_element_with_text {|e| p [e.class, e] }
Output:
[REXML::Element, <b> ... </>] [REXML::Element, <c> ... </>] [REXML::Element, <d> ... </>]
Use the same method with the single argument
text
to traverse those element children that have exactly that text:my_doc.root.each_element_with_text('b') {|e| p [e.class, e] }
Output:
[REXML::Element, <b> ... </>] [REXML::Element, <c> ... </>]
Use the same method with additional second argument
max
to traverse no more than the given number of element children:my_doc.root.each_element_with_text('b', 1) {|e| p [e.class, e] }
Output:
[REXML::Element, <b> ... </>]
Use the same method with additional third argument
xpath
to traverse only those element children that also match the given xpath:my_doc.root.each_element_with_text('b', 2, '//c') {|e| p [e.class, e] }
Output:
[REXML::Element, <c> ... </>]
- Traverse Element Children’s Indexes
-
Use inherited method
REXML::Parent#each_index
to traverse all children’s indexes (not just those of element children):doc.root.each_index {|i| print i }
Output:
012345678
- Traverse Children Recursively
-
Use included method
REXML::Node#each_recursive
to traverse all children recursively:doc.root.each_recursive {|child| p [child.class, child] }
Output:
[REXML::Element, <book category='cooking'> ... </>] [REXML::Element, <title lang='en'> ... </>] [REXML::Element, <author> ... </>] [REXML::Element, <year> ... </>] [REXML::Element, <price> ... </>] [REXML::Element, <book category='children'> ... </>] [REXML::Element, <title lang='en'> ... </>] [REXML::Element, <author> ... </>] [REXML::Element, <year> ... </>] [REXML::Element, <price> ... </>] [REXML::Element, <book category='web'> ... </>] [REXML::Element, <title lang='en'> ... </>] [REXML::Element, <author> ... </>] [REXML::Element, <author> ... </>] [REXML::Element, <author> ... </>] [REXML::Element, <author> ... </>] [REXML::Element, <author> ... </>] [REXML::Element, <year> ... </>] [REXML::Element, <price> ... </>] [REXML::Element, <book category='web' cover='paperback'> ... </>] [REXML::Element, <title lang='en'> ... </>] [REXML::Element, <author> ... </>] [REXML::Element, <year> ... </>] [REXML::Element, <price> ... </>]
Searching¶ ↑
You can use certain methods to search among the descendants of an element.
Use method REXML::Element#get_elements
to retrieve all element children of the element that match the given xpath
:
xml_string = <<-EOT <root> <a level='1'> <a level='2'/> </a> </root> EOT d = Document.new(xml_string) d.root.get_elements('//a') # => [<a level='1'> ... </>, <a level='2'/>]
Use method REXML::Element#get_text
with no argument to retrieve the first text node in the first child:
my_doc = Document.new "<p>some text <b>this is bold!</b> more text</p>" text_node = my_doc.root.get_text text_node.class # => REXML::Text text_node.to_s # => "some text "
Use the same method with argument xpath
to retrieve the first text node in the first child that matches the xpath:
my_doc.root.get_text(1) # => "this is bold!"
Use method REXML::Element#text
with no argument to retrieve the text from the first text node in the first child:
my_doc = Document.new "<p>some text <b>this is bold!</b> more text</p>" text_node = my_doc.root.text text_node.class # => String text_node # => "some text "
Use the same method with argument xpath
to retrieve the text from the first text node in the first child that matches the xpath:
my_doc.root.text(1) # => "this is bold!"
Use included method REXML::Node#find_first_recursive
to retrieve the first descendant element for which the given block returns a truthy value, or nil
if none:
doc.root.find_first_recursive do |ele| ele.name == 'price' end # => <price> ... </> doc.root.find_first_recursive do |ele| ele.name == 'nosuch' end # => nil
Editing¶ ↑
Editing a Document¶ ↑
- Creating a Document
-
Create a new document with method
REXML::Document::new
:doc = Document.new(source_string) empty_doc = REXML::Document.new
- Adding to the Document
-
Add an XML declaration with method
REXML::Document#add
and an argument of typeREXML::XMLDecl
:my_doc = Document.new my_doc.xml_decl.to_s # => "" my_doc.add(XMLDecl.new('2.0')) my_doc.xml_decl.to_s # => "<?xml version='2.0'?>"
Add a document type with method
REXML::Document#add
and an argument of typeREXML::DocType
:my_doc = Document.new my_doc.doctype.to_s # => "" my_doc.add(DocType.new('foo')) my_doc.doctype.to_s # => "<!DOCTYPE foo>"
Add a node of any other
REXML
type with methodREXML::Document#add
and an argument that is not of typeREXML::XMLDecl
orREXML::DocType
:my_doc = Document.new my_doc.add(Element.new('foo')) my_doc.to_s # => "<foo/>"
Add an existing element as the root element with method
REXML::Document#add_element
:ele = Element.new('foo') my_doc = Document.new my_doc.add_element(ele) my_doc.root # => <foo/>
Create and add an element as the root element with method
REXML::Document#add_element
:my_doc = Document.new my_doc.add_element('foo') my_doc.root # => <foo/>
Editing an Element¶ ↑
Creating an Element¶ ↑
Create a new element with method REXML::Element::new
:
ele = Element.new('foo') # => <foo/>
Setting Element Properties¶ ↑
Set the context for an element with method REXML::Element#context=
(see Element Context):
ele.context # => nil ele.context = {ignore_whitespace_nodes: :all} ele.context # => {:ignore_whitespace_nodes=>:all}
Set the parent for an element with inherited method REXML::Child#parent=
ele.parent # => nil ele.parent = Element.new('bar') ele.parent # => <bar/>
Set the text for an element with method REXML::Element#text=
:
ele.text # => nil ele.text = 'bar' ele.text # => "bar"
Adding to an Element¶ ↑
Add a node as the last child with inherited method REXML::Parent#add
(or its alias push):
ele = Element.new('foo') # => <foo/> ele.push(Text.new('bar')) ele.push(Element.new('baz')) ele.children # => ["bar", <baz/>]
Add a node as the first child with inherited method REXML::Parent#unshift
:
ele = Element.new('foo') # => <foo/> ele.unshift(Element.new('bar')) ele.unshift(Text.new('baz')) ele.children # => ["bar", <baz/>]
Add an element as the last child with method REXML::Element#add_element
:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_element(Element.new('baz')) ele.children # => [<bar/>, <baz/>]
Add a text node as the last child with method REXML::Element#add_text
:
ele = Element.new('foo') # => <foo/> ele.add_text('bar') ele.add_text(Text.new('baz')) ele.children # => ["bar", "baz"]
Insert a node before a given node with method REXML::Parent#insert_before
:
ele = Element.new('foo') # => <foo/> ele.add_text('bar') ele.add_text(Text.new('baz')) ele.children # => ["bar", "baz"] target = ele[1] # => "baz" ele.insert_before(target, Text.new('bat')) ele.children # => ["bar", "bat", "baz"]
Insert a node after a given node with method REXML::Parent#insert_after
:
ele = Element.new('foo') # => <foo/> ele.add_text('bar') ele.add_text(Text.new('baz')) ele.children # => ["bar", "baz"] target = ele[0] # => "bar" ele.insert_after(target, Text.new('bat')) ele.children # => ["bar", "bat", "baz"]
Add an attribute with method REXML::Element#add_attribute
:
ele = Element.new('foo') # => <foo/> ele.add_attribute('bar', 'baz') ele.add_attribute(Attribute.new('bat', 'bam')) ele.attributes # => {"bar"=>bar='baz', "bat"=>bat='bam'}
Add multiple attributes with method REXML::Element#add_attributes
:
ele = Element.new('foo') # => <foo/> ele.add_attributes({'bar' => 'baz', 'bat' => 'bam'}) ele.add_attributes([['ban', 'bap'], ['bah', 'bad']]) ele.attributes # => {"bar"=>bar='baz', "bat"=>bat='bam', "ban"=>ban='bap', "bah"=>bah='bad'}
Add a namespace with method REXML::Element#add_namespace
:
ele = Element.new('foo') # => <foo/> ele.add_namespace('bar') ele.add_namespace('baz', 'bat') ele.namespaces # => {"xmlns"=>"bar", "baz"=>"bat"}
Deleting from an Element¶ ↑
Delete a specific child object with inherited method REXML::Parent#delete
:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.children # => [<bar/>, "baz"] target = ele[1] # => "baz" ele.delete(target) # => "baz" ele.children # => [<bar/>] target = ele[0] # => <baz/> ele.delete(target) # => <baz/> ele.children # => []
Delete a child at a specific index with inherited method REXML::Parent#delete_at
:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.children # => [<bar/>, "baz"] ele.delete_at(1) ele.children # => [<bar/>] ele.delete_at(0) ele.children # => []
Delete all children meeting a specified criterion with inherited method REXML::Parent#delete_if
:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.add_element('bat') ele.add_text('bam') ele.children # => [<bar/>, "baz", <bat/>, "bam"] ele.delete_if {|child| child.instance_of?(Text) } ele.children # => [<bar/>, <bat/>]
Delete an element at a specific 1-based index with method REXML::Element#delete_element
:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.add_element('bat') ele.add_text('bam') ele.children # => [<bar/>, "baz", <bat/>, "bam"] ele.delete_element(2) # => <bat/> ele.children # => [<bar/>, "baz", "bam"] ele.delete_element(1) # => <bar/> ele.children # => ["baz", "bam"]
Delete a specific element with the same method:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.add_element('bat') ele.add_text('bam') ele.children # => [<bar/>, "baz", <bat/>, "bam"] target = ele.elements[2] # => <bat/> ele.delete_element(target) # => <bat/> ele.children # => [<bar/>, "baz", "bam"]
Delete an element matching an xpath using the same method:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.add_element('bat') ele.add_text('bam') ele.children # => [<bar/>, "baz", <bat/>, "bam"] ele.delete_element('./bat') # => <bat/> ele.children # => [<bar/>, "baz", "bam"] ele.delete_element('./bar') # => <bar/> ele.children # => ["baz", "bam"]
Delete an attribute by name with method REXML::Element#delete_attribute
:
ele = Element.new('foo') # => <foo/> ele.add_attributes({'bar' => 'baz', 'bam' => 'bat'}) ele.attributes # => {"bar"=>bar='baz', "bam"=>bam='bat'} ele.delete_attribute('bam') ele.attributes # => {"bar"=>bar='baz'}
Delete a namespace with method REXML::Element#delete_namespace
:
ele = Element.new('foo') # => <foo/> ele.add_namespace('bar') ele.add_namespace('baz', 'bat') ele.namespaces # => {"xmlns"=>"bar", "baz"=>"bat"} ele.delete_namespace('xmlns') ele.namespaces # => {} # => {"baz"=>"bat"} ele.delete_namespace('baz') ele.namespaces # => {} # => {}
Remove an element from its parent with inherited method REXML::Child#remove
:
ele = Element.new('foo') # => <foo/> parent = Element.new('bar') # => <bar/> parent.add_element(ele) # => <foo/> parent.children.size # => 1 ele.remove # => <foo/> parent.children.size # => 0
Replacing Nodes¶ ↑
Replace the node at a given 0-based index with inherited method REXML::Parent#[]=
:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.add_element('bat') ele.add_text('bam') ele.children # => [<bar/>, "baz", <bat/>, "bam"] ele[2] = Text.new('bad') # => "bad" ele.children # => [<bar/>, "baz", "bad", "bam"]
Replace a given node with another node with inherited method REXML::Parent#replace_child
:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.add_element('bat') ele.add_text('bam') ele.children # => [<bar/>, "baz", <bat/>, "bam"] target = ele[2] # => <bat/> ele.replace_child(target, Text.new('bah')) ele.children # => [<bar/>, "baz", "bah", "bam"]
Replace self
with a given node with inherited method REXML::Child#replace_with
:
ele = Element.new('foo') # => <foo/> ele.add_element('bar') ele.add_text('baz') ele.add_element('bat') ele.add_text('bam') ele.children # => [<bar/>, "baz", <bat/>, "bam"] target = ele[2] # => <bat/> target.replace_with(Text.new('bah')) ele.children # => [<bar/>, "baz", "bah", "bam"]
Cloning¶ ↑
Create a shallow clone of an element with method REXML::Element#clone
. The clone contains the name and attributes, but not the parent or children:
ele = Element.new('foo') ele.add_attributes({'bar' => 0, 'baz' => 1}) ele.clone # => <foo bar='0' baz='1'/>
Create a shallow clone of a document with method REXML::Document#clone
. The XML declaration is copied; the document type and root element are not cloned:
my_xml = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo><root/>' my_doc = Document.new(my_xml) clone_doc = my_doc.clone my_doc.xml_decl # => <?xml ... ?> clone_doc.xml_decl # => <?xml ... ?> my_doc.doctype.to_s # => "<?xml version='1.0' encoding='UTF-8'?>" clone_doc.doctype.to_s # => "" my_doc.root # => <root/> clone_doc.root # => nil
Create a deep clone of an element with inherited method REXML::Parent#deep_clone
. All nodes and attributes are copied:
doc.to_s.size # => 825 clone = doc.deep_clone clone.to_s.size # => 825
Writing the Document¶ ↑
Write a document to an IO stream (defaults to $stdout
) with method REXML::Document#write
:
doc.write
Output:
<?xml version='1.0' encoding='UTF-8'?> <bookstore> <book category='cooking'> <title lang='en'>Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category='children'> <title lang='en'>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category='web'> <title lang='en'>XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category='web' cover='paperback'> <title lang='en'>Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>