Proxy Rotation in Ruby: The Patterns That Actually Work

Most Ruby developers reach for a simple array of proxy URLs when they first add rotation to a scraper: shuffle the list, pick one, make the request. This works until volume grows, at which point the scraper starts returning partial results or silent failures that are hard to trace. Getting rotation right is less about finding the correct gem and more about four independent problems: how to configure proxies in your HTTP library, when to rotate versus when to hold a session, how to detect blocks reliably, and how to track proxy health so degraded addresses stop burning requests.

Configuring a Proxy in Ruby’s HTTP Libraries

The three HTTP libraries most commonly used in Ruby scraping — Net::HTTP, Faraday, and HTTParty — each take a different approach to proxy configuration.

Net::HTTP exposes proxy parameters directly on the connection constructor:

require ‘net/http’

http = Net::HTTP.new(

‘target-site.com’, 443,

‘proxy.example.com’, 8080, # proxy host and port

‘username’, ‘password’ # proxy credentials

)

http.use_ssl = true

response = http.get(‘/data’)

Faraday accepts proxy configuration at the connection level. Explicit timeouts are important with proxies — a slow or unresponsive address will stall a thread indefinitely without them:

require ‘faraday’

conn = Faraday.new(‘https://target-site.com’) do |f|

f.proxy = ‘http://username:password@proxy.example.com:8080’

f.options.open_timeout = 10 # seconds to establish the proxy connection

f.options.timeout = 30 # seconds to receive the full response

f.adapter Faraday.default_adapter

end

response = conn.get(‘/data’)

HTTParty sets the proxy at the class level by default, though it also supports per-request proxy options passed directly to the HTTP method. The class-level pattern is most common for straightforward scraping:

require ‘httparty’

class Scraper

include HTTParty

http_proxy ‘proxy.example.com’, 8080, ‘username’, ‘password’

base_uri ‘https://target-site.com’

end

response = Scraper.get(‘/data’)

Each library configures a proxy once per connection or class. The rotation logic lives above the library, not inside it.

The Pattern Most Developers Start With

The simplest rotation approach samples from an array on each request:

PROXIES = [

‘http://user:pass@proxy1.example.com:8080’,

‘http://user:pass@proxy2.example.com:8080’,

‘http://user:pass@proxy3.example.com:8080’

]

def fetch(url)

proxy = PROXIES.sample

Faraday.new(proxy: proxy).get(url)

end

This fails at scale for two reasons. First, a small pool causes the same addresses to reappear within the target site’s detection window. If a site rate-limits by IP over a fifteen-minute period and you have six proxies making hundreds of requests per hour, each proxy surfaces repeatedly before the window clears. Second, random sampling gives you no control over distribution. Some addresses will be hit far more often than others in any given window — which makes it difficult to track per-proxy error rates, respect usage limits on specific addresses, or identify which proxies are degraded. In a high-volume scraper, that loss of observability compounds into a reliability problem.

A Thread-Safe Proxy Rotator

A round-robin rotator with a Mutex handles concurrent access and distributes load evenly across the pool:

class ProxyRotator

def initialize(proxies)

@proxies = proxies.freeze

@index = 0

@lock = Mutex.new

end

def next_proxy

@lock.synchronize do

proxy = @proxies[@index % @proxies.length]

@index += 1

proxy

end

end

end

ROTATOR = ProxyRotator.new(PROXIES)

Round-robin is preferable to random sampling when you need predictable per-address traffic volumes — it makes each proxy’s request count deterministic and easier to reason about when debugging block patterns or usage costs.

Rotation as Faraday Middleware

Faraday’s middleware stack is the cleanest integration point for rotation logic. A middleware component intercepts every outgoing request and injects the next proxy before the request is dispatched:

class RotatingProxyMiddleware < Faraday::Middleware

def initialize(app, rotator)

super(app)

@rotator = rotator

end

def call(env)

env[:request][:proxy] = { uri: URI.parse(@rotator.next_proxy) }

@app.call(env)

end

end

conn = Faraday.new(‘https://target-site.com’) do |f|

f.use RotatingProxyMiddleware, ROTATOR

f.adapter Faraday.default_adapter

end

Scraper code makes normal Faraday calls with no knowledge of which proxy is active. Swapping the rotation strategy means changing the middleware, not the scraping logic. Note that proxy injection through env[:request] depends on the adapter in use — the built-in Net::HTTP adapter supports this pattern, but behaviour varies across other adapters. Verify against your specific setup before relying on this in production.

Sticky Sessions for Multi-Step Workflows

Some scraping tasks require sending several related requests from the same IP — paginating through server-side session state, following a multi-step product flow, or scraping content that varies based on prior interaction. Rotating the proxy between steps breaks session continuity.

A sticky session pool assigns a fixed proxy to each logical workflow and replaces it only on expiry:

class StickyProxyPool

SESSION_TTL = 300 # seconds

def initialize(proxies)

@proxies = proxies

@sessions = {}

@lock = Mutex.new

end

def proxy_for(session_id)

@lock.synchronize do

entry = @sessions[session_id]

if entry.nil? || Time.now – entry[:assigned_at] > SESSION_TTL

@sessions[session_id] = { proxy: @proxies.sample, assigned_at: Time.now }

end

@sessions[session_id][:proxy]

end

end

end

The session ID is whatever identifies a logical unit of work: a product ID, a user flow, a scraping task key. Rotation happens at the workflow boundary, not between individual requests.

Block Detection Beyond Status Codes

Relying on HTTP status codes alone to detect blocks misses a common and frustrating pattern: many sites return 200 with a CAPTCHA challenge, an access-denied message, or a bot-detection interstitial embedded in the body. A scraper that does not check the response body will record these as successes and write garbage data.

A blocked? helper that inspects both status and body catches these cases:

BLOCK_STATUSES = [403, 429, 503].freeze

BLOCK_PATTERNS = [

/access.?denied/i,

/captcha/i,

/rate.?limit/i,

/unusual.?traffic/i,

/please.?verify/i

].freeze

def blocked?(response)

return true if BLOCK_STATUSES.include?(response.status)

BLOCK_PATTERNS.any? { |pat| response.body.to_s.match?(pat) }

end

The patterns here are starting points — build them from actual blocked responses observed against your target sites rather than assumptions. False positives on body patterns waste proxies; false negatives produce corrupt data. Logging the full response body when a block is detected makes calibration straightforward.

With a block? helper in place, the failure-based rotation function becomes:

def fetch(url, rotator, attempts: 3)

attempts.times do

proxy = rotator.next_proxy

response = Faraday.new(‘https://target-site.com’, proxy: proxy).get(url)

return response unless blocked?(response)

end

raise ‘All proxies returned block responses for #{url}’

rescue Faraday::ConnectionFailed, Faraday::TimeoutError

retry if (attempts -= 1) > 0

raise

end

Proxy Health Scoring and Pool Maintenance

A round-robin or random rotator has no concept of proxy health — it will keep dispatching requests through an address that has been blocked, timed out repeatedly, or become unreachable. In production this surfaces as a sustained drop in success rate that is hard to attribute without per-proxy tracking.

A health-aware pool tracks failure counts per proxy, excludes addresses that exceed a failure threshold from rotation, and exposes a stats method for observability:

class ProxyPool

FAILURE_THRESHOLD = 3

HealthEntry = Struct.new(:url, :failures, :requests)

def initialize(proxies)

@entries = proxies.map { |u| HealthEntry.new(u, 0, 0) }

@lock = Mutex.new

end

def acquire

@lock.synchronize do

healthy = @entries.reject { |e| e.failures >= FAILURE_THRESHOLD }

raise ‘Proxy pool exhausted’ if healthy.empty?

entry = healthy.min_by { |e| e.requests } # least-used first

entry.requests += 1

entry.url

end

end

def mark_success(url)

update(url) { |e| e.failures = [e.failures – 1, 0].max }

end

def mark_failure(url)

update(url) { |e| e.failures += 1 }

end

def stats

@lock.synchronize do

@entries.map do |e|

rate = e.requests.zero? ? 0.0 : e.failures.fdiv(e.requests).round(3)

{ url: e.url, requests: e.requests, failure_rate: rate,

healthy: e.failures < FAILURE_THRESHOLD }

end

end

end

private

def update(url)

@lock.synchronize { yield @entries.find { |e| e.url == url } }

end

end

Usage pairs the pool with the blocked? check:

pool = ProxyPool.new(PROXIES)

def fetch_with_pool(url, pool)

proxy = pool.acquire

response = Faraday.new(‘https://target-site.com’, proxy: proxy).get(url)

if blocked?(response)

pool.mark_failure(proxy)

raise ‘Blocked response from #{proxy}’

end

pool.mark_success(proxy)

response

rescue Faraday::Error => e

pool.mark_failure(proxy)

raise

end

The least-used selection inside acquire distributes load more evenly than strict round-robin when proxies have different failure counts — a partially degraded pool stays balanced rather than piling requests onto a shrinking set of healthy addresses.

Calibrating FAILURE_THRESHOLD requires observing actual failure patterns against your target sites. A threshold of three handles transient network errors without discarding addresses prematurely; a threshold of one would burn proxies on single timeouts. Call pool.stats periodically — ideally logging it to your monitoring system — to track failure rates per address, identify proxies that are consistently degraded, and decide when the pool needs replenishing.

When to Stop Managing Rotation Yourself

The pool management problem scales faster than the scraping logic. For a Ruby scraper processing tens of thousands of pages per day, sourcing residential IPs in the volumes required to stay below per-address detection thresholds, validating them against target sites, and cycling addresses as they age out becomes substantial infrastructure work.

For high-volume production scrapers, the practical alternative is a managed rotation provider. Providers like NetNut offer rotating proxies backed by a large residential and ISP network, with rotation handled at the connection layer. Depending on account configuration, rotation can happen per request or per session — most managed providers support both modes. The Ruby integration reduces to a standard proxy configuration:

# Rotation is handled server-side.

# Your application connects through a single endpoint.

conn = Faraday.new(‘https://target-site.com’) do |f|

f.proxy = {

uri: ‘http://gw.netnut.io:7777’,

user: ENV[‘NETNUT_USER’],

password: ENV[‘NETNUT_PASS’]

}

f.options.open_timeout = 10

f.options.timeout = 30

f.adapter Faraday.default_adapter

end

Pool sizing, address validation, and rotation strategy are handled server-side. The blocked? check and response validation in your application code remain relevant — a managed provider handles IP rotation, but block detection and response quality checks stay in the scraper.

Matching Pattern to Task

Different scraping tasks call for different combinations of these patterns. High-volume stateless retrieval — price monitoring, rank tracking, availability checks — suits Faraday middleware with a ProxyPool and per-request rotation. Multi-step workflows need StickyProxyPool keyed by workflow ID with rotation at the boundary. Scrapers against targets with unpredictable block rates benefit from failure-based rotation inside fetch_with_pool, where mark_failure drives the health scoring.

These patterns are composable. A Sidekiq job doing stateless product checks uses RotatingProxyMiddleware backed by a ProxyPool. A separate job running multi-page session scrapes uses StickyProxyPool. Both can share the same underlying pool instance and the same blocked? detection logic.

Proxy rotation in Ruby is not a library selection problem — Net::HTTP, Faraday, and HTTParty all handle proxies cleanly. The decisions that determine whether a scraper holds up at scale are above the library: pool size and IP quality, rotation granularity relative to session requirements, block detection that reads the response body and not just the status code, and health tracking that removes degraded addresses before they corrupt your data or your metrics.

Leave a Reply

Your email address will not be published. Required fields are marked *