module URI
URI
is a module providing classes to handle Uniform Resource Identifiers (RFC2396).
Features¶ ↑
-
Uniform way of handling URIs.
-
Flexibility to introduce custom
URI
schemes. -
Flexibility to have an alternate URI::Parser (or just different patterns and regexp’s).
Basic example¶ ↑
require 'uri' uri = URI("http://foo.com/posts?id=30&limit=5#time=1305298413") #=> #<URI::HTTP http://foo.com/posts?id=30&limit=5#time=1305298413> uri.scheme #=> "http" uri.host #=> "foo.com" uri.path #=> "/posts" uri.query #=> "id=30&limit=5" uri.fragment #=> "time=1305298413" uri.to_s #=> "http://foo.com/posts?id=30&limit=5#time=1305298413"
Adding custom URIs¶ ↑
module URI class RSYNC < Generic DEFAULT_PORT = 873 end register_scheme 'RSYNC', RSYNC end #=> URI::RSYNC URI.scheme_list #=> {"FILE"=>URI::File, "FTP"=>URI::FTP, "HTTP"=>URI::HTTP, # "HTTPS"=>URI::HTTPS, "LDAP"=>URI::LDAP, "LDAPS"=>URI::LDAPS, # "MAILTO"=>URI::MailTo, "RSYNC"=>URI::RSYNC} uri = URI("rsync://rsync.foo.com") #=> #<URI::RSYNC rsync://rsync.foo.com>
RFC References¶ ↑
A good place to view an RFC spec is www.ietf.org/rfc.html.
Here is a list of all related RFC’s:
Class tree¶ ↑
-
URI::Generic
(in uri/generic.rb)-
URI::File
- (in uri/file.rb) -
URI::FTP
- (in uri/ftp.rb) -
URI::HTTP
- (in uri/http.rb)-
URI::HTTPS
- (in uri/https.rb)
-
-
URI::LDAP
- (in uri/ldap.rb)-
URI::LDAPS
- (in uri/ldaps.rb)
-
-
URI::MailTo
- (in uri/mailto.rb)
-
-
URI::Parser - (in uri/common.rb)
-
URI::REGEXP - (in uri/common.rb)
-
URI::REGEXP::PATTERN - (in uri/common.rb)
-
-
URI::Util - (in uri/common.rb)
-
URI::Error
- (in uri/common.rb)-
URI::InvalidURIError
- (in uri/common.rb) -
URI::InvalidComponentError
- (in uri/common.rb) -
URI::BadURIError
- (in uri/common.rb)
-
Copyright Info¶ ↑
- Author
-
Akira Yamada <akira@ruby-lang.org>
- Documentation
-
Akira Yamada <akira@ruby-lang.org> Dmitry V. Sabanin <sdmitry@lrn.ru> Vincent Batts <vbatts@hashbangbash.com>
- License
-
Copyright © 2001 akira yamada <akira@ruby-lang.org> You can redistribute it and/or modify it under the same term as Ruby.
Constants
- DEFAULT_PARSER
- INITIAL_SCHEMES
- RFC2396_PARSER
- RFC3986_PARSER
- TBLENCURICOMP_
Public Class Methods
# File uri/common.rb, line 48 def self.const_missing(const) if value = RFC2396_PARSER.regexp[const] warn "URI::#{const} is obsolete. Use RFC2396_PARSER.regexp[#{const.inspect}] explicitly.", uplevel: 1 if $VERBOSE value else super end end
Like URI.decode_www_form_component
, except that '+'
is preserved.
# File uri/common.rb, line 401 def self.decode_uri_component(str, enc=Encoding::UTF_8) _decode_uri_component(/%\h\h/, str, enc) end
Returns name/value pairs derived from the given string str
, which must be an ASCII string.
The method may be used to decode the body of Net::HTTPResponse object res
for which res['Content-Type']
is 'application/x-www-form-urlencoded'
.
The returned data is an array of 2-element subarrays; each subarray is a name/value pair (both are strings). Each returned string has encoding enc
, and has had invalid characters removed via String#scrub.
A simple example:
URI.decode_www_form('foo=0&bar=1&baz') # => [["foo", "0"], ["bar", "1"], ["baz", ""]]
The returned strings have certain conversions, similar to those performed in URI.decode_www_form_component
:
URI.decode_www_form('f%23o=%2F&b-r=%24&b+z=%40') # => [["f#o", "/"], ["b-r", "$"], ["b z", "@"]]
The given string may contain consecutive separators:
URI.decode_www_form('foo=0&&bar=1&&baz=2') # => [["foo", "0"], ["", ""], ["bar", "1"], ["", ""], ["baz", "2"]]
A different separator may be specified:
URI.decode_www_form('foo=0--bar=1--baz', separator: '--') # => [["foo", "0"], ["bar", "1"], ["baz", ""]]
# File uri/common.rb, line 576 def self.decode_www_form(str, enc=Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false) raise ArgumentError, "the input of #{self.name}.#{__method__} must be ASCII only string" unless str.ascii_only? ary = [] return ary if str.empty? enc = Encoding.find(enc) str.b.each_line(separator) do |string| string.chomp!(separator) key, sep, val = string.partition('=') if isindex if sep.empty? val = key key = +'' end isindex = false end if use__charset_ and key == '_charset_' and e = get_encoding(val) enc = e use__charset_ = false end key.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_) if val val.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_) else val = +'' end ary << [key, val] end ary.each do |k, v| k.force_encoding(enc) k.scrub! v.force_encoding(enc) v.scrub! end ary end
Returns a string decoded from the given URL-encoded string str
.
The given string is first encoded as Encoding::ASCII-8BIT (using String#b), then decoded (as below), and finally force-encoded to the given encoding enc
.
The returned string:
-
Preserves:
-
Characters
'*'
,'.'
,'-'
, and'_'
. -
Character in ranges
'a'..'z'
,'A'..'Z'
, and'0'..'9'
.
Example:
URI.decode_www_form_component('*.-_azAZ09') # => "*.-_azAZ09"
-
-
Converts:
-
Character
'+'
to character' '
. -
Each “percent notation” to an ASCII character.
Example:
URI.decode_www_form_component('Here+are+some+punctuation+characters%3A+%2C%3B%3F%3A') # => "Here are some punctuation characters: ,;?:"
-
Related: URI.decode_uri_component
(preserves '+'
).
# File uri/common.rb, line 390 def self.decode_www_form_component(str, enc=Encoding::UTF_8) _decode_uri_component(/\+|%\h\h/, str, enc) end
Like URI.encode_www_form_component
, except that ' '
(space) is encoded as '%20'
(instead of '+'
).
# File uri/common.rb, line 396 def self.encode_uri_component(str, enc=nil) _encode_uri_component(/[^*\-.0-9A-Z_a-z]/, TBLENCURICOMP_, str, enc) end
Returns a URL-encoded string derived from the given Enumerable enum
.
The result is suitable for use as form data for an HTTP request whose Content-Type
is 'application/x-www-form-urlencoded'
.
The returned string consists of the elements of enum
, each converted to one or more URL-encoded strings, and all joined with character '&'
.
Simple examples:
URI.encode_www_form([['foo', 0], ['bar', 1], ['baz', 2]]) # => "foo=0&bar=1&baz=2" URI.encode_www_form({foo: 0, bar: 1, baz: 2}) # => "foo=0&bar=1&baz=2"
The returned string is formed using method URI.encode_www_form_component
, which converts certain characters:
URI.encode_www_form('f#o': '/', 'b-r': '$', 'b z': '@') # => "f%23o=%2F&b-r=%24&b+z=%40"
When enum
is Array-like, each element ele
is converted to a field:
-
If
ele
is an array of two or more elements, the field is formed from its first two elements (and any additional elements are ignored):name = URI.encode_www_form_component(ele[0], enc) value = URI.encode_www_form_component(ele[1], enc) "#{name}=#{value}"
Examples:
URI.encode_www_form([%w[foo bar], %w[baz bat bah]]) # => "foo=bar&baz=bat" URI.encode_www_form([['foo', 0], ['bar', :baz, 'bat']]) # => "foo=0&bar=baz"
-
If
ele
is an array of one element, the field is formed fromele[0]
:URI.encode_www_form_component(ele[0])
Example:
URI.encode_www_form([['foo'], [:bar], [0]]) # => "foo&bar&0"
-
Otherwise the field is formed from
ele
:URI.encode_www_form_component(ele)
Example:
URI.encode_www_form(['foo', :bar, 0]) # => "foo&bar&0"
The elements of an Array-like enum
may be mixture:
URI.encode_www_form([['foo', 0], ['bar', 1, 2], ['baz'], :bat]) # => "foo=0&bar=1&baz&bat"
When enum
is Hash-like, each key
/value
pair is converted to one or more fields:
-
If
value
is Array-convertible, each elementele
invalue
is paired withkey
to form a field:name = URI.encode_www_form_component(key, enc) value = URI.encode_www_form_component(ele, enc) "#{name}=#{value}"
Example:
URI.encode_www_form({foo: [:bar, 1], baz: [:bat, :bam, 2]}) # => "foo=bar&foo=1&baz=bat&baz=bam&baz=2"
-
Otherwise,
key
andvalue
are paired to form a field:name = URI.encode_www_form_component(key, enc) value = URI.encode_www_form_component(value, enc) "#{name}=#{value}"
Example:
URI.encode_www_form({foo: 0, bar: 1, baz: 2}) # => "foo=0&bar=1&baz=2"
The elements of a Hash-like enum
may be mixture:
URI.encode_www_form({foo: [0, 1], bar: 2}) # => "foo=0&foo=1&bar=2"
# File uri/common.rb, line 523 def self.encode_www_form(enum, enc=nil) enum.map do |k,v| if v.nil? encode_www_form_component(k, enc) elsif v.respond_to?(:to_ary) v.to_ary.map do |w| str = encode_www_form_component(k, enc) unless w.nil? str << '=' str << encode_www_form_component(w, enc) end end.join('&') else str = encode_www_form_component(k, enc) str << '=' str << encode_www_form_component(v, enc) end end.join('&') end
Returns a URL-encoded string derived from the given string str
.
The returned string:
-
Preserves:
-
Characters
'*'
,'.'
,'-'
, and'_'
. -
Character in ranges
'a'..'z'
,'A'..'Z'
, and'0'..'9'
.
Example:
URI.encode_www_form_component('*.-_azAZ09') # => "*.-_azAZ09"
-
-
Converts:
-
Character
' '
to character'+'
. -
Any other character to “percent notation”; the percent notation for character c is
'%%%X' % c.ord
.
Example:
URI.encode_www_form_component('Here are some punctuation characters: ,;?:') # => "Here+are+some+punctuation+characters%3A+%2C%3B%3F%3A"
-
Encoding:
-
If
str
has encoding Encoding::ASCII_8BIT, argumentenc
is ignored. -
Otherwise
str
is converted first to Encoding::UTF_8 (with suitable character replacements), and then to encodingenc
.
In either case, the returned string has forced encoding Encoding::US_ASCII.
Related: URI.encode_uri_component
(encodes ' '
as '%20'
).
# File uri/common.rb, line 357 def self.encode_www_form_component(str, enc=nil) _encode_uri_component(/[^*\-.0-9A-Z_a-z]/, TBLENCWWWCOMP_, str, enc) end
Returns a new object constructed from the given scheme
, arguments
, and default
:
-
The new object is an instance of
URI.scheme_list[scheme.upcase]
. -
The object is initialized by calling the class initializer using
scheme
andarguments
. SeeURI::Generic.new
.
Examples:
values = ['john.doe', 'www.example.com', '123', nil, '/forum/questions/', nil, 'tag=networking&order=newest', 'top'] URI.for('https', *values) # => #<URI::HTTPS https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top> URI.for('foo', *values, default: URI::HTTP) # => #<URI::HTTP foo://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top>
# File uri/common.rb, line 145 def self.for(scheme, *arguments, default: Generic) const_name = scheme.to_s.upcase uri_class = INITIAL_SCHEMES[const_name] uri_class ||= if /\A[A-Z]\w*\z/.match?(const_name) && Schemes.const_defined?(const_name, false) Schemes.const_get(const_name, false) end uri_class ||= default return uri_class.new(scheme, *arguments) end
Merges the given URI
strings str
per RFC 2396.
Each string in str
is converted to an RFC3986 URI before being merged.
Examples:
URI.join("http://example.com/","main.rbx") # => #<URI::HTTP http://example.com/main.rbx> URI.join('http://example.com', 'foo') # => #<URI::HTTP http://example.com/foo> URI.join('http://example.com', '/foo', '/bar') # => #<URI::HTTP http://example.com/bar> URI.join('http://example.com', '/foo', 'bar') # => #<URI::HTTP http://example.com/bar> URI.join('http://example.com', '/foo/', 'bar') # => #<URI::HTTP http://example.com/foo/bar>
# File uri/common.rb, line 233 def self.join(*str) DEFAULT_PARSER.join(*str) end
Returns a new URI object constructed from the given string uri
:
URI.parse('https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top') # => #<URI::HTTPS https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top> URI.parse('http://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top') # => #<URI::HTTP http://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top>
It’s recommended to first ::escape string uri
if it may contain invalid URI
characters.
# File uri/common.rb, line 206 def self.parse(uri) DEFAULT_PARSER.parse(uri) end
# File uri/common.rb, line 25 def self.parser=(parser = RFC3986_PARSER) remove_const(:Parser) if defined?(::URI::Parser) const_set("Parser", parser.class) remove_const(:REGEXP) if defined?(::URI::REGEXP) remove_const(:PATTERN) if defined?(::URI::PATTERN) if Parser == RFC2396_Parser const_set("REGEXP", URI::RFC2396_REGEXP) const_set("PATTERN", URI::RFC2396_REGEXP::PATTERN) Parser.new.pattern.each_pair do |sym, str| unless REGEXP::PATTERN.const_defined?(sym) REGEXP::PATTERN.const_set(sym, str) end end end Parser.new.regexp.each_pair do |sym, str| remove_const(sym) if const_defined?(sym) const_set(sym, str) end end
Registers the given klass
as the class to be instantiated when parsing a URI with the given scheme
:
URI.register_scheme('MS_SEARCH', URI::Generic) # => URI::Generic URI.scheme_list['MS_SEARCH'] # => URI::Generic
Note that after calling String#upcase on scheme
, it must be a valid constant name.
# File uri/common.rb, line 101 def self.register_scheme(scheme, klass) Schemes.const_set(scheme.to_s.upcase, klass) end
Returns a hash of the defined schemes:
URI.scheme_list # => {"MAILTO"=>URI::MailTo, "LDAPS"=>URI::LDAPS, "WS"=>URI::WS, "HTTP"=>URI::HTTP, "HTTPS"=>URI::HTTPS, "LDAP"=>URI::LDAP, "FILE"=>URI::File, "FTP"=>URI::FTP}
Related: URI.register_scheme
.
# File uri/common.rb, line 119 def self.scheme_list Schemes.constants.map { |name| [name.to_s.upcase, Schemes.const_get(name)] }.to_h end
Returns a 9-element array representing the parts of the URI formed from the string uri
; each array element is a string or nil
:
names = %w[scheme userinfo host port registry path opaque query fragment] values = URI.split('https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top') names.zip(values) # => [["scheme", "https"], ["userinfo", "john.doe"], ["host", "www.example.com"], ["port", "123"], ["registry", nil], ["path", "/forum/questions/"], ["opaque", nil], ["query", "tag=networking&order=newest"], ["fragment", "top"]]
# File uri/common.rb, line 192 def self.split(uri) DEFAULT_PARSER.split(uri) end
Private Class Methods
# File uri/common.rb, line 419 def self._decode_uri_component(regexp, str, enc) raise ArgumentError, "invalid %-encoding (#{str})" if /%(?!\h\h)/.match?(str) str.b.gsub(regexp, TBLDECWWWCOMP_).force_encoding(enc) end
# File uri/common.rb, line 405 def self._encode_uri_component(regexp, table, str, enc) str = str.to_s.dup if str.encoding != Encoding::ASCII_8BIT if enc && enc != Encoding::ASCII_8BIT str.encode!(Encoding::UTF_8, invalid: :replace, undef: :replace) str.encode!(enc, fallback: ->(x){"&##{x.ord};"}) end str.force_encoding(Encoding::ASCII_8BIT) end str.gsub!(regexp, table) str.force_encoding(Encoding::US_ASCII) end