June 02, 2018

Http.rb is Great

The Ruby ecosystem has many HTTP clients gems to choose from. Some are built on libcurl (Typhoeus, Curb, Patron), some on Ruby’s Net::HTTP (RestClient, HTTParty, Nestful), some in pure Ruby (HTTPClient, Excon, http.rb), and some are wrappers around existing libraries (Faraday, HTTPI).

Over the years I’ve had the opportunity to try out most of these libraries, but ever since I discovered http.rb a year ago, it’s been my favourite HTTP client library. So, in this post I would like to talk about the features that make http.rb stand out for me.

Net::HTTP

Before we start, I would like to say a few words about Net::HTTP. Net::HTTP is the HTTP client that comes with the Ruby standard library, and it’s used in many places. People often reach for it when they want something simple, or they use it indirectly through gems like RestClient and HTTParty.

However, whenever I used Net::HTTP, I always had a bad time. Some of the reasons were

inconsistent and overly verbose API that’s tough to remember
poorly designed internals making the code difficult to read
having to handle low-level system call exceptions

To illustrate, in my previous job we used Net::HTTP for notifying webhooks, and this is the code that we ended up with:

def post_payload(callback_url, payload)
  uri = URI.parse(callback_url)

  options = { open_timeout: 15, read_timeout: 30 }
  options.merge!(use_ssl: true) if uri.scheme == "https"

  Net::HTTP.start(uri.host, uri.port, options) do |http|
    post = Net::HTTP::Post.new(uri.request_uri)
    post.body = JSON.dump(payload)
    post["Content-Type"] = "application/json"

    http.request(post)
  end
rescue SocketError,
       EOFError,
       IOError,
       SystemCallError, # superclass for all Errno::* exceptions
       Timeout::Error, # superclass for Net::ReadTimeout and Net::OpenTimeout
       Net::HTTPBadResponse,
       Net::HTTPHeaderSyntaxError,
       Net::ProtocolError,
       OpenSSL::SSL::SSLError
  # handle exception
end

The wish to make performing tasks like these easier was probably what motivated people to build HTTP client gems on top of Net::HTTP (RestClient, HTTParty, Nestful etc). They wanted to improve the API, but didn’t want to reimplement the HTTP protocol. However, I found the Net::HTTP codebase to be very convoluted, and always felt frustrated whenever I needed to read it. So, I don’t think that building on top of Net::HTTP is a good design decision, because Net::HTTP is not a clean implementation of the HTTP protocol to begin with.

When creating http.rb, Tony Arcieri decided to rebuild the HTTP protocol natively in Ruby (and also created the Socketry gem to make working with TCP/UDP sockets easier). This allowed http.rb to have a fresh start, without the Net::HTTP baggage.

http.rb was born out of rage from using Net::HTTP

— Tony Arcieri, creator of http.rb

Refreshing API

One of the important goals of http.rb was to make the API easy to use. Let’s rewrite the previous Net::HTTP example of POSTing a JSON payload to an URL, this time using http.rb.

def post_payload(callback_url, payload)
  http = HTTP.timeout(connect: 15, read: 30)
  http.post(callback_url, json: payload)
rescue HTTP::Error
  # handle exception
end

Much easier! Unlike Net::HTTP, http.rb wraps all low-level exceptions into a nice exception hierarchy that’s easy to handle:

HTTP::Error
- HTTP::ConnectionError
- HTTP::RequestError
- HTTP::ResponseError
- HTTP::TimeoutError
- HTTP::HeaderError

Here is a more comprehensive example of interaction with http.rb objects:

response = HTTP.get("https://example.com")
response # => #<HTTP::Response/1.1 200 OK ...>

response.status          # => #<HTTP::Response::Status 200 OK>
response.status.code     # => 200
response.status.ok?      # => true (200)
response.status.success? # => true (200..299)

response.headers         # => #<HTTP::Headers {…}>
response.headers.to_h    # => { "Content-Type"=>"text/html", ... }

response.body            # => #<HTTP::Response::Body>
response.body.to_s       # => "<!doctype html>..."

Where http.rb really shines is its chainable API for building request options. You can use it to build an HTTP client with default request options, and then make subsequent requests with it.

# Build an HTTP::Client with default request options
http = HTTP
  .headers("User-Agent" => "MyApp/1.0.0")
  .accept(:json)
  .basic_auth(user: "janko", pass: "secret")
  .via("https://proxy.com")
  .follow(max_hops: 2)

http.get("https://blog.com/posts")
http.get("https://blog/com/posts/1")
http.get("https://blog/com/posts/1/comments")

Streaming

One of the features of http.rb that I like very much is the support for streaming requests and responses. This is really useful when you need to transfer large amounts of data over-the-wire which you don’t want to load into memory all at once (think uploading and downloading large files).

Uploads

There are two ways you can stream content into the request body with http.rb. One is providing an Enumerable object, where you can have #each lazily generate chunks of content for the request body. The easiest way is to create an Enumerator object.

Another way is providing an IO-like object that implements IO#read. In this case http.rb will read the IO content in small chunks and write them to the request body. Notice that the object doesn’t have to be an actual File instance, which is very convenient in contexts such as Shrine, where the “file” to be uploaded can be a File, Tempfile, StringIO, ActionDispatch::Http::UploadedFile, Shrine::UploadedFile, Down::ChunkedIO, or any other object that implements IO#read.

HTTP.put("http://example.com/upload", body: io) # streaming upload

Multipart Form Data

Http.rb will stream request bodies in multipart form data format as well:

HTTP.post("http://example.com/upload", form: { file: HTTP::FormData::File.new(io) })

It uses the http-form_data gem to create a HTTP::FormData::Multipart object, which implements IO#read an generates multipart form data body on-the-fly, so streaming works on the same principle as above. The implementation of the streaming functionality in http-form_data was inspired by the multipart-post gem.

Even though the multipart-post gem exists, and Net::HTTP itself has multipart form data functionality, RestClient and HTTParty still decided to implement their own. RestClient writes the multipart form data body to disk before sending it (Net::HTTP does this as well), while RestClient loads the whole body into memory. When uploading large files, both of these approaches require resource planning, so that you don’t risk running out of disk/memory.

The advantage of http.rb’s streaming approach is that resource usage doesn’t grow with the size of the request body, so it’s suitable for large payloads. What’s also great is that http-form_data is a generic gem which can be reused by any HTTP client library, so the Ruby ecosystem can standardise on this implementation instead of reinventing the wheel.

Downloads

When you make a request with http.rb, the response headers are immediately retrieved, but not the response body, giving you the chance to stream it if you want to.

# retrieves the whole response body as a string
HTTP.get("http://example.com/download").to_s

# streams the response body in chunks
response = HTTP.get("http://example.com/download")
response.body.each do |chunk|
  # ...
end

This API allowed me to easily implement on-demand downloads for Shrine, which is useful if you want to determine the MIME type of a remote file, but don’t want to download the whole file for that (MIME type can typically be determined from the first few kilobytes of the file).

Net::HTTP also supports streaming the response body, but the API is much more limiting due to having to wrap the streaming in the Net::HTTP.start block. I was eventually able to implement on-demand downloads for Net::HTTP as well, but I had to use Fibers to work around this limitation.

Persistent Connections

Http.rb supports persistent (keep-alive) connections, which allows you to reuse the same TCP socket for multiple requests to the same domain. This way you don’t have to pay the price of establishing a connection for each request, which can make a significant difference in performance.

HTTP.get("https://example.com").to_s # connect + write + read + close
HTTP.get("https://example.com").to_s # connect + write + read + close
HTTP.get("https://example.com").to_s # connect + write + read + close

HTTP.persistent("https://example.com") do |http|
  http.get("/").to_s # connect + write + read
  http.get("/").to_s # write + read
  http.get("/").to_s # write + read
end                  # close

# OR

http = HTTP.persistent("https://example.com")
http.get("/").to_s # connect + write + read
http.get("/").to_s # write + read
http.get("/").to_s # write + read
http.close         # close

On a Heroku dyno, the first example takes about 1.1s, whereas the example that uses a persistent connection takes only 0.6s, which shows that the performance difference can be significant.

Net::HTTP also supports persistent connections, but requests have to be performed inside the Net::HTTP.start block (alternatively you can use net-http-persistent). HTTP client libraries built on top of libcurl automatically use persistent connections (as that feature is built into libcurl), so with them you don’t need to think about it.

Timeouts

Like most HTTP client libraries, http.rb allows you to specify connect and read timeout. Connect timeout limits the time for opening the TCP connection, while read timeout limits the time of reading a single chunk of the response.

http = HTTP.timeout(connect: 1, read: 1)
http.get("http://example.com") # raises HTTP::TimeoutError

Net::HTTP also has connect and read timeout, but it has a caveat for read timeout – idempotent requests will be automatically retried on timeout error. This means that, if you specify read timeout of 4 seconds, a request could raise a timeout error only after 8 seconds, because it was already retried once. See this article for more details.

Write timeout

In addition to connect and read timeout, http.rb also comes with a write timeout. A write timeout limits the time it takes to write a single chunk of the request. It’s important to have this feature when sending requests with large bodies, as those require multiple write system calls.

HTTP.timeout(connect: 1, write: 1, read: 1)

There is a feature request to add write timeout to Net::HTTP, but as of this writing it hasn’t been implemented yet. I’m not aware of any HTTP client library that supports write timeouts.

Global timeout

Requests can be written and responses can be read in multiple write or read system calls. The default :read and :write timeout limits the time for each operation. This means that if you set :connect, :read, and :write timeouts to 1 second each, the request could still potentially take longer than 3 seconds if multiple write or read calls are executed.

Http.rb has the ability to specify a global timeout, where you can limit the total amount of time the HTTP request can take. This is again most useful with large amounts of data, where potentially many write/read system calls will be executed.

# request can take 2 + 1 + 2 = 5 seconds in total
HTTP.timeout(:global, connect: 2, write: 1, read: 2)

It might be tempting to just wrap the whole HTTP call in a Timeout block, but remember, Timeout API is dangerous. Http.rb implements read and write timeouts natively, it only uses Timeout for the connect timeout (as doing it natively is a bit more involved).

Compression

The HTTP 1.1 protocol supports compressing request and response bodies, which decreases network resource usage, with the cost of increased CPU usage needed for (de)compressing. This can improve speed when transferring large amounts of data, depending on how well the request/response bodies compress.

Http.rb has support for automatically compressing (“deflating”) request bodies:

HTTP.use(:auto_deflate)
    .post("http://example.com/upload", body: File.open("file.txt")) # compression

# POST /upload HTTP/1.1
# Content-Length: 53057934
# Content-Encoding: gzip  <========
#
# [compressed content]

and automatically decompressing (“inflating”) response bodies:

HTTP.use(:auto_inflate)
    .get("http://example.com/download") # compression

# HTTP/1.1 200 OK
# Content-Length: 53057934
# Content-Encoding: gzip  <========
#
# [compressed content]

This works with streaming requests and responses. For regular requests the total size needs to be calculated first for setting the Content-Length request header, so in this case the compressed request body will be written to disk before it’s sent. But with chunked requests the request body will be compressed on-the-fly, as those don’t require the Content-Length request header.

Memory Usage

Ruby processes tend to consume a lot of memory. Ruby developers deal with this in various ways: tweaking Ruby’s GC settings, killing web workers once they reach certain memory threshold, running the Ruby processes on jemalloc etc. However, I think there are still many opportunities for reducing the amount of objects we allocate in the first place, which is the approach that Richard Schneeman (derailed_benchmarks & countless PRs), Sam Saffron (rack-mini-profiler, memory_profiler, flamegraph, RubyBench.org), and Eric Wong (Ruby commits) actively promote.

Eric Wong (Unicorn author and Ruby committer) recently talked about this in a ruby-talk thread titled “String memory use reduction techniques”. There, Eric states that what is often to blame for high memory usage in Ruby applications are string objects. He shows various techniques for limiting string allocations, as well as deallocating strings that are no longer needed. After all, the less “garbage” there is, the better the garbage collector will perform

It so happens that HTTP intractions can allocate a lot of strings, especially for large request and response bodies. I measured memory usage of http.rb, Net::HTTP, RestClient, and HTTParty when uploading and downloading 10 MB of data. Here are the results:

Library	Uploading 10MB	Downloading 10MB
http.rb	0.10 MB	0.2 MB
Net::HTTP	0.02 MB	12.36 MB
RestClient	9.03 MB	12.57 MB
HTTParty	40.03 MB	12.59 MB

In the uploading benchmark, we can see that http.rb and Net::HTTP memory usage is low, RestClient allocates 1x the request body size, and HTTParty allocates 4x the request body size. The http.rb memory usage is slightly higher, but it appears to be constant regardless of the request body size. Note that with uploads over SSL the memory usage will be much higher for each library, because Ruby’s OpenSSL::SSL::SSLSocket is currently very memory-inefficient (but there is a patch waiting to be merged).

In the downloading benchmark, http.rb has very low memory usage which stays the same regardless of the response body size, while the other libraries allocate approximately 1x the response body size (due to Net::HTTP). Note that this will be fixed in Ruby 2.6.0 due to Eric Wong’s recent patch, after which memory consumption drops to the same levels as http.rb.

Conclusion

I found http.rb to be a very impressive HTTP client library. It has a very nice easy-to-use API, good exception hierarchy, full streaming support, persistent connections, advanced timeout options, HTTP compression support and more.

I believe that one of the main things that helped it shape up is implementing the HTTP protocol natively instead of relying on Net::HTTP. This also spawned some reusable libraries – Socketry, http-form_data, and content-type – which is always sound sign of good design in my book.

Since I maintain libraries for handling file uploads and downloads (Shrine, Down, tus-ruby-server), it’s important to me to have an HTTP client library that I can recommend. The streaming upload/download support and very low memory usage makes http.rb a great choice, especially when dealing with large files.

I encourage you to try http.rb on your next project!