URI support for Ruby
Akira Yamada <akira@ruby-lang.org>
Akira Yamada <akira@ruby-lang.org>, Dmitry V. Sabanin <sdmitry@lrn.ru>
Copyright © 2001 akira yamada <akira@ruby-lang.org> You can redistribute it and/or modify it under the same term as Ruby.
$Id: uri.rb 11708 2007-02-12 23:01:19Z shyouhei $
See URI for documentation
Akira Yamada <akira@ruby-lang.org>
$Id: common.rb 16982 2008-06-07 20:11:00Z shyouhei $
You can redistribute it and/or modify it under the same term as Ruby.
URI::extract(str[, schemes][,&blk])
str
String to extract URIs from.
schemes
Limit URI matching to a specific schemes.
Extracts URIs from a string. If block given, iterates through all matched URIs. Returns nil if block given or array with matches.
require "uri" URI.extract("text here http://foo.example.org/bla and here mailto:test@example.com and here also.") # => ["http://foo.example.com/bla", "mailto:test@example.com"]
# File uri/common.rb, line 551
def self.extract(str, schemes = nil, &block)
if block_given?
str.scan(regexp(schemes)) { yield $& }
nil
else
result = []
str.scan(regexp(schemes)) { result.push $& }
result
end
end
URI::join(str[, str, ...])
str
String(s) to work with
Joins URIs.
require 'uri' p URI.join("http://localhost/","main.rbx") # => #<URI::HTTP:0x2022ac02 URL:http://localhost/main.rbx>
# File uri/common.rb, line 519
def self.join(*str)
u = self.parse(str[0])
str[1 .. -1].each do |x|
u = u.merge(x)
end
u
end
URI::parse(uri_str)
uri_str
String with URI.
Creates one of the URI's subclasses instance from the string.
Raised if URI given is not a correct one.
require 'uri' uri = URI.parse("http://www.ruby-lang.org/") p uri # => #<URI::HTTP:0x202281be URL:http://www.ruby-lang.org/> p uri.scheme # => "http" p uri.host # => "www.ruby-lang.org"
# File uri/common.rb, line 483
def self.parse(uri)
scheme, userinfo, host, port,
registry, path, opaque, query, fragment = self.split(uri)
if scheme && @@schemes.include?(scheme.upcase)
@@schemes[scheme.upcase].new(scheme, userinfo, host, port,
registry, path, opaque, query,
fragment)
else
Generic.new(scheme, userinfo, host, port,
registry, path, opaque, query,
fragment)
end
end
URI::regexp([match_schemes])
match_schemes
Array of schemes. If given, resulting regexp matches to URIs whose scheme is one of the match_schemes.
Returns a Regexp object which matches to URI-like strings. The Regexp object returned by this method includes arbitrary number of capture group (parentheses). Never rely on it's number.
require 'uri' # extract first URI from html_string html_string.slice(URI.regexp) # remove ftp URIs html_string.sub(URI.regexp(['ftp']) # You should not rely on the number of parentheses html_string.scan(URI.regexp) do |*matches| p $& end
# File uri/common.rb, line 593
def self.regexp(schemes = nil)
unless schemes
ABS_URI_REF
else
/(?=#{Regexp.union(*schemes)}:)#{PATTERN::X_ABS_URI}/xn
end
end
URI::split(uri)
uri
String with URI.
Splits the string on following parts and returns array with result:
* Scheme * Userinfo * Host * Port * Registry * Path * Opaque * Query * Fragment
require 'uri' p URI.split("http://www.ruby-lang.org/") # => ["http", nil, "www.ruby-lang.org", nil, nil, "/", nil, nil, nil]
# File uri/common.rb, line 380
def self.split(uri)
case uri
when ''
# null uri
when ABS_URI
scheme, opaque, userinfo, host, port,
registry, path, query, fragment = $~[1..-1]
# URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
# absoluteURI = scheme ":" ( hier_part | opaque_part )
# hier_part = ( net_path | abs_path ) [ "?" query ]
# opaque_part = uric_no_slash *uric
# abs_path = "/" path_segments
# net_path = "//" authority [ abs_path ]
# authority = server | reg_name
# server = [ [ userinfo "@" ] hostport ]
if !scheme
raise InvalidURIError,
"bad URI(absolute but no scheme): #{uri}"
end
if !opaque && (!path && (!host && !registry))
raise InvalidURIError,
"bad URI(absolute but no path): #{uri}"
end
when REL_URI
scheme = nil
opaque = nil
userinfo, host, port, registry,
rel_segment, abs_path, query, fragment = $~[1..-1]
if rel_segment && abs_path
path = rel_segment + abs_path
elsif rel_segment
path = rel_segment
elsif abs_path
path = abs_path
end
# URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
# relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
# net_path = "//" authority [ abs_path ]
# abs_path = "/" path_segments
# rel_path = rel_segment [ abs_path ]
# authority = server | reg_name
# server = [ [ userinfo "@" ] hostport ]
else
raise InvalidURIError, "bad URI(is not URI?): #{uri}"
end
path = '' if !path && !opaque # (see RFC2396 Section 5.2)
ret = [
scheme,
userinfo, host, port, # X
registry, # X
path, # Y
opaque, # Y
query,
fragment
]
return ret
end