nitlink

Nitlink

Nitlink is a nice, nitpicky gem for parsing Link headers, which sticks as closely as possible to Mark Nottingham’s parsing algorithm (from his most recent redraft of RFC 5988). That means it’s particularly good at handling weird edge cases, UTF-8 encoded parameters, URI resolution, boolean parameters and more. It also plays nicely with a bunch of popular HTTP client libraries, has an extensive test suite, and zero external dependencies.

Tested with Ruby versions from 1.9.3 up to 2.4.1. Ruby 2.0+ is fully supported, 1.9.3 has fully functional parsing, but the support for third-party clients is somewhat limited (because, for example, Net::HTTPResponse doesn’t expose the request URI in 1.9.3).

Installation
Usage
Feature comparison
API
Changelog
Developing Nitlink
Contributing
Future features
License
Author

Installation

Install the gem from RubyGems:

gem install nitlink

Or add it to your Gemfile and run bundle install

gem 'nitlink', '~> 1.1'

And you’re ready to go!

require 'httparty'
require 'nitlink/response'

HTTParty.get('https://www.w3.org/wiki/Main_Page').links.by_rel('last').target
 => #<URI::HTTPS https://www.w3.org/wiki/index.php?title=Main_Page&oldid=100698>

Usage

The most basic way to use Nitlink is to directly pass in a HTTP response from Net::HTTP:

require 'nitlink'
require 'net/http'
require 'awesome_print' # <- not required, just for this demo

link_parser = Nitlink::Parser.new
response = Net::HTTP.get_response(URI.parse 'https://api.github.com/search/code?q=addClass+user:mozilla')

links = link_parser.parse(response)
ap links

# =>
[
    [0] #<Nitlink::Link:0x7fcd09019158
        context = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user:mozilla>,
        relation_type = "next",
        target = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2>,
        target_attributes = {}
    >,
    [1] #<Nitlink::Link:0x7fcd09011fe8
        context = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user:mozilla>,
        relation_type = "last",
        target = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>,
        target_attributes = {}
    >
]

links is actually a Nitlink::LinkCollection - an enhanced array which makes it convenient to grab a link based on its relation_type:

links.by_rel('next').target.to_s
#=> 'https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2'

Third-party clients

Nitlink also supports a large number of third-party HTTP clients:

Curb
Excon
Faraday
http.rb
httpclient
HTTParty
OpenURI (part of the standard lib)
Patron
REST Client
Typhoeus
Unirest - ⚠️ Deprecated, will be removed in Nitlink 2.0, see here.

You can pass a HTTP response from one of these libraries straight into the #parse method:

response = HTTParty.get('https://api.github.com/search/code?q=addClass+user:mozilla')
links = link_parser.parse(response)

For the extra lazy, you can instead require nitlink/response which decorates the various response objects from third-party clients with a new #links method, which returns the parsed Link headers from that response. nitlink/response must be required after the third-party client. (Note: Net::HTTPResponse also gets decorated, even though it’s not technically third-party).

require 'httparty'
require 'nitlink/response'

ap HTTParty.get('https://api.github.com/search/code?q=addClass+user:mozilla').links

# =>
[
    [0] #<Nitlink::Link:0x7fcd09019158
        context = #<URI::HTTPS https://api.github.com/search/code?q=addClass+user:mozilla>,
# ....

response.links is just syntactic sugar for calling Nitlink::Parser.new.parse(response)

Response as a hash

You can also pass the relevant response data as a hash (with keys as strings or symbols):

links = link_parser.parse({
  request_uri: 'https://api.github.com/search/code?q=addClass+user:mozilla',
  status: 200,
  headers: { 'Link' => '<https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2>; rel="next", <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last"' }
})

Non-`GET` requests

For fully correct behavior, when the making a request using a HTTP method other than GET, specify the method type as the second argument of #parse:

response = HTTParty.post('https://api.github.com/search/code?q=addClass+user:mozilla')
links = link_parser.parse(response, 'POST')

This allows Nitlink to correctly set the context of links (resources fetched by a method other than GET or HEAD generally have an anonymous context) - but otherwise everything works OK if you don’t specify this.

Example: paginating Github search

Here we make an initial call to the Github API’s search endpoint then iterate through the pages of results using Link headers:

require 'nitlink'
require 'net/http'

link_parser = Nitlink::Parser.new
first_page = HTTParty.get('https://api.github.com/search/code?q=onwheel+user:mozilla')
links = link_parser.parse(first_page)

results = first_page.parsed_response['items']

while links.by_rel('next')
  response = HTTParty.get(links.by_rel('next').target)
  results += first_page.parsed_response['items']

  links = link_parser.parse(response)
end

Feature comparison

A few different Link header parsers (in various languages) already exist. Some of them are quite lovely :relaxed: ! Nitlink does its best to be as feature complete as possible; as far as I know it’s the first library to cover all the areas the spec (RFC 5988) sets out:

Feature	Nitlink	parse-link-header	link_header	li	weblinking	link-headers	backbone-paginator	http-link	node-http-link-header
Encoded params (per RFC 5987)	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>
URI resolution	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>
Establish link context	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>
Ignore quoted separators	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:white_check_mark:</p>
Parse “weird” headers^†	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>
Proper escaping	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>
Boolean attributes	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>
Ignore duplicate params	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>
Multiple relation types	<p align='center'>:white_check_mark:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:white_check_mark:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>	<p align='center'>:x:</p>

^† i.e. can it parse weird looking, but technically valid headers like <http://example.com/;;;,,,>; rel="next;;;,,, next"; a-zA-Z0-9!#$&+-.^_|~=!#$%&'()*+-./0-9:<=>?@a-zA-Z[]^_{|}~; title*=UTF-8'de'N%c3%a4chstes%20Kapitel?

API

Nitlink::Parser

`#parse(response, method = 'GET')` => `Nitlink::LinkCollection`

Accepts the following arguments:

response (required) - The HTTP response whose Link header you wish to parse. Can be any one of:
- An instance of Net::HTTPResponse and its subclasses
- An instance of Curl::Easy, Excon::Response, Faraday::Response, HTTP::Message, HTTP::Response, HTTParty::Response, Patron::Response, RestClient::Response or Typhoeus::Response (or Unirest::HttpResponse although this is deprecated, and will be removed in Nitlink 2.0)
- An instance of StringIO or Tempfile created by OpenURI’s Kernel#open method
- A Hash containing:
  - request_uri (String or URI) - the URI of the requested resource
  - status - the numerical status code of the response (e.g. 200)
  - headers (Hash or String) - headers can either be provided as a Hash of HTTP header fields (with keys being the field names and the values being the field values) or a raw HTTP header string (each field separated by CR-LF pairs). Only the Link and Content-Location headers are used by Nitlink. Nitlink treats field names case-insensitively.
    { headers: { 'Content-Location' => 'http://example.com' 'Link' => '</page/2>; rel=next' } } # Or { headers: "Content-Location: http://example.com\r\nLink: </page/2>; rel=next" }
method (optional, String) - The HTTP method used to make the request. Defaults to 'GET'. This is used to establish the correct identity (per RFC 7231, Section 3.1.4.1)

Returns a Nitlink::LinkCollection containing Nitlink::Link objects:

When the response contains no Link header an empty collection is returned
Links without a relation type (rel) specified are omitted
The links’ parameters are serialized into the Nitlink::Link’s target_attributes. For more details of the serialization see Nitlink::Link#target_attributes.

Where a link has more than one relation type, one entry per relation type is appended:

ap parser.parse({
  request_uri: 'http://example.com',
  status: 200,
  headers: { 'Link' => '</readme>; rel="about version-history"' }
})

[
    [0] #<Nitlink::Link:0x7fcda9330be8
        context = #<URI::HTTP http://example.com>,
        relation_type = "about",
        target = #<URI::HTTP http://example.com/readme>,
        target_attributes = {}
    >,
    [1] #<Nitlink::Link:0x7fcda9330bc0
        context = #<URI::HTTP http://example.com>,
        relation_type = "version-history",
        target = #<URI::HTTP http://example.com/readme>,
        target_attributes = {}
    >
]

If the Link header does not begin with "<", or "<" isn’t followed by ">" it’s considered malformed and unparseable - in which case a Nitlink::MalformedLinkHeaderError is thrown. If response is an instance of a class which Nitlink doesn’t know how to handle (e.g. from an unsupported third-party client) a Nitlink::UnknownResponseTypeError is thrown.

Nitlink::LinkCollection

An extension of Array with additional convenience methods for handling links based on their relation type.

`#by_rel(relation_type)` => `Nitlink::Link` or `nil`

Accepts the following argument:

relation_type (required, String or Symbol) - a single relation type which the returned link should represent (e.g. by_rel('terms-of-service') would find a link pointing to legal terms).

Returns a single Nitlink::Link object whose relation_type attribute matches the relation type provided, or nil if the collection doesn’t contain a matching link. If two links exist which match the provided relation type (this should never happen in practice), the first matching link in the collection is returned.

Raises an ArgumentError if the relation_type is blank.

`#to_h(options = { with_indifferent_access: true })` => `Nitlink::HashWithIndifferentAccess` or `Hash`

Also aliased as to_hash.

Accepts the following arguments:

options (optional, Hash) - When options[:with_indifferent_access] is truthy (as it is by default) the method returns a Nitlink::HashWithIndifferentAccess where each key is a relation type and each value is a Nitlink::Link. When options[:with_indifferent_access] is falsy it returns the equivalent Hash with string keys.

An empty collection will return an empty Nitlink::HashWithIndifferentAccess/Hash. If two links exist which match a given relation type, the value will be the first link in the collection.

Nitlink::Link

A Struct representing a single link with a specific relation type. It has four attributes:

context - the context of the link
target - where the linked resource is located
relation_type - the relation type, which identifies the semantics of the link
target_attributes - a set of key/value pairs that give additional information about the link

#<Nitlink::Link:0x7fcda89489a0
    context = #<URI::HTTP http://example.com>,
    target = #<URI::HTTP http://example.com/readme>,
    relation_type = "about",
    target_attributes = {
      "title" => "About us"
    }
>

`#context` => `URI` or `nil`

Returns the context of the link as a URI object. Usually this will be the same as the request URI, but may be modified by the anchor parameter or Content-Location header. Additionally some HTTP request methods or status codes result in an “anonymous” link context being assigned (represented by nil).

`#target` => `URI`

Returns the target of the link as a URI object. If the URI given in the Link header is relative, Nitlink resolves it (based on the request URI).

`#relation_type` => `String`

A single relation type, describing the kind of relationship this link represents. For example, "prev" would indicate that the target resource immediately precedes the context. It could also be an extension relation type (an absolute URI serialized as a string).

Relation types are always case-normalized to lowercase.

`#target_attributes` => `Hash`

Captures the values of the parameters that aren’t used to construct the context or target (i.e. other than rel and anchor) title, for example.

Parameters ending in * are decoded per RFC 5987, bis-03. Where decoding fails, the parameter is omitted.

Boolean parameters (e.g. crossorigin) have their values set to nil. Any backslash escaped characters within quoted parameter values are unescaped. The names of attributes are case-normalized to lowercase. Only the first occurrences of media, title, title* or type parameters are parsed, subsequent occurrences are ignored.

If no additional parameters exist, target_attributes is an empty hash.

ap parser.parse({
  request_uri: 'http://example.com',
  status: 200,
  headers: { 'Link' => %q{</about>; rel=about; title="About us"; title*=utf-8'en'About%20%C3%BCs; crossorigin} }
})

#=>
[
    [0] #<Nitlink::Link:0x7fcda9274bc8
        context = #<URI::HTTP http://example.com>,
        relation_type = "about",
        target = #<URI::HTTP http://example.com/about>,
        target_attributes = {
                  "title" => "About us",
                 "title*" => "About üs",
            "crossorigin" => nil
        }
    >
]

Nitlink::HashWithIndifferentAccess

Implements a hash where keys :foo and “foo” are considered to be the same. It’s closely modeled on Thor’s implementation (Thor::CoreExt::HashWithIndifferentAccess), except without the magic predicates.

Instances are largely interchangeable with ActiveSupport::HashWithIndifferentAccess, but doesn’t have the additional methods not present on Hash, like stringify_keys, symbolize_keys, regular_writer etc.

Changelog

Nitlink follows semantic versioning.

1.1.0 (1 September 2017)

Remove dependency on hashwithindifferentaccess gem, implement native HashWithIndifferentAccess class modeled on Thor’s implementation (see #1)
Add with_indifferent_access option to #to_h, and alias #to_h as #to_hash
Deprecate Unirest support (see #3)
Improve compatibility and build process for Ruby 1.9.3 and 2.0
Fix minor Curb regression

1.0.0 (7 November 2016)

Initial release

Developing Nitlink

Clone the git repo

  git clone git://github.com/alexpeattie/nitlink.git

Install dependencies

  cd nitlink
  bundle install

You can skip installing the various third-party HTTP clients Nitlink supports, to get up and running faster (some specs will fail)

  bundle install --without clients

Run the test suite

  bundle exec rspec

You can also generate a Simplecov coverage report by setting the COVERAGE environment variable:

  COVERAGE=true bundle exec rspec

Contributing

Pull requests are very welcome! Please try to follow these simple rules if applicable:

Fork it (https://github.com/alexpeattie/nitlink/fork)
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create a new Pull Request

Future features

Validate (non-extension) relation types against those listed in the official Link Relation Type Registry
Check the format of known parameters (e.g. type should be in the format foo/bar)
Detect the language of the linked resource (from hreflang or language information in an encoded title*)
Convert extension relation types to URIs
Support for parameter fallback
Add option to ignore links with anchor parameters
Better handling of duplicate parameters

License

Nitlink is released under the MIT license. (See License.md)

Author

Alex Peattie / alexpeattie.com / @alexpeattie

This site is open source. Improve this page.