War Against Bots

Internet is a dangerous place. Fortunately, most attackers are unsophisticated.

In the previous article I gave you a high-level overview of the technology behind www.whoishiring.jobs, a new job site I launched a few weeks ago. This article is the first one covering the product in greater detail.

3, 2, 1 … Lift Off!

I announced the launch of the app on Twitter and Ruby on Rails Link Slack, and other developers were clearly enthusiastic about the idea. The next few days brought a steady stream of new accounts at an unexpectedly high pace. When the counter hit 200 accounts I was ecstatic!

The initial euphoria didn’t last long though. Checking the accounts table regularly, I noticed most accounts were left unconfirmed. My initial hypothesis was email deliverability issues. There were three factors that led me to believe this could be the case:

  1. I was sending emails from a new domain.
  2. The emails mentioned “jobs” and “hiring” a lot, which I speculated could be frequent spam keywords.
  3. Some test emails from that domain had been previously classified as spam.

I reached out to a few new accounts, but only one replied saying he hadn’t received the confirmation email. He was able to sign in after I sent the link manually. At the same time, I asked for help on Twitter and discussed the problem with a few other engineers. Despite their generous help I still could see no progress in either confirming or rejecting the email deliverability hypothesis. The fact that my email provider marked almost all those emails as delivered only added to the confusion.

I made a few unsuccessful attempts at resolving the problem: lengthened magic link validity, removed spam keywords from emails, and started sending out reminders to abandoned accounts. None of that had helped.

Epiphany

Missing account confirmations wasn’t the only problem I started experiencing right after launch. Another one was a steady stream of ActionController::ParameterMissing with params looking like this:

{
  "account": {},
  "account[email]": "<email address here>"
}

Given the tech stack is Rails and Hotwire, I couldn’t fathom how such request could be sent from the signup form. I was expecting to see something along the lines of:

{
  "account": {
    "email": "<email address here>"
  }
}

My extensive efforts to reproduce the problem locally had failed. I clearly had insufficient data, so I decided to dig for more. I took an email from an error report, looked it up in logs, and looked up all requests made from the same IP address. Eureka!

Most requests coming from that IP address, and a dozen other addresses I checked, were indicative of an automated vulnerability scan. I realized those unexplained errors and unconfirmed accounts were related. I was experiencing account spam.

It turned out I was facing three interrelated problems: automated vulnerability scans, account spam, unexplained signup errors. I decided to tackle them in that order.

Problem 1: Automated Vulnerability scans

Vulnerability scanners were using a rather small and predictable set of URLs, which made building a blacklist trivial. I installed rack-attack and configured it to ban aggressively: 1 strike and you’re out for 1 hour. I simply don’t believe anyone types in a web site name and then totally accidentally follows it with /ms-windows-store:/.

The relevant bits from the rack-attack initializer are below. It was enough to eliminate almost all noise from logs. It was time to tackle account spam.

# This is where rack-attack keeps its state. It's not persisted yet,
# but is good enough for now.
Rack::Attack.cache.store = ActiveSupport::Cache::MemoryStore.new

# An array of regular expressions matching the URLs being scanned.
SUSPICOUS_PATHS = [
  %r{\A/ms-windows-store:/},
  # ...
].freeze

# A blacklist called "bots" ...
Rack::Attack.blocklist("bots") do |request|
  # ... that bans requests from an IP address for one hour after a
  # single offending requests.
  Rack::Attack::Fail2Ban.filter("bot:#{request.ip}", maxretry: 1, findtime: 1.minute, bantime: 1.hour) do
    # If the request path matches any of the suspicious paths then
    # it's a bot.
    SUSPICOUS_PATHS.any? { _1.match?(request.path) }
  end
end

# Just send a blank 200 OK to all blacklisted clients.
Rack::Attack.blocklisted_responder = lambda do |_request|
  [200, {}, []]
end

Problem 2: Account Spam

To combat account spam I decided to use a method I had applied successfully in the past: decoy fields. A decoy field is an input present in the form markup, but hidden via CSS, so that no user would see it or fill it in. Fortunately, most bots are dumb and if they stumble upon an input named login they’ll provide a value. The presence of that value is an indicator of an automated submission.

A few lines of code added to AccountsController were all that was necessary to defeat account spam. I also decided to tackle invalid submissions (problem 3) using the same code, but still didn’t know how to reproduce them.

class AccountsController < ApplicationController
  # If it's a spam request then pretend the account was created
  # successfully to avoid informing the attacker the attack has been
  # thwarted.
  before_action :render_fake_creation, if: :spam_request?

  private

  def spam_request?
    # A spam request contains unexpected params or ...
    params.key?("account[email]") ||
      params.key?("account[login]") ||

      # ... contains a value for the decoy field.
      params.dig(:account, :login).present?
  end

  def render_fake_creation = render partial: "created"
end

Problem 3: Invalid Requests

At this point I had mostly solved the problem of account spam and invalid requests. However, I don’t consider my job to be finished unless I understand what’s going on, and in that case I had to understand how to reproduce those ActionController::ParameterMissing errors. The goal was twofold: educating myself and enhancing the rack-attack blacklist.

I again reached out for help on Twitter and the Ruby on Rails Link Slack. A Rubyist named Dimo engaged in a conversation about the topic, and after some back and forth came up with an idea that allowed me to make progress. Let’s have another look at the params I was trying to reproduce:

{
  "account": {},
  "account[email]": "<email address here>"
}

It’s impossible to produce such params with a form submission, but it’s trivial with a JSON request. Running curl against localhost triggered the elusive exception confirming the hypothesis. Dimo, you saved me hours of debugging. Thank you!

I immediately applied this new knowledge to enhance the blocklist. Ideally, rack-attack would intercept ActionController::ParameterMissing and then add the requester to the blacklist, but the most recent version supported filtering on requests only. There was a pull request to block based on responses but it hadn’t been merged.

I decided to implement a simple predicate method account_spam? that would try to mimic request parsing as performed by Rails. All I had to do was to instantiate ActionDispatch::Request and ActionController::Parameters and the call params.require(:account), as this was the method responsible for raising the error. The predicate method is below:

# Determines whether the request is an attempt at account spam.
def account_spam?(request)
  # We're interested only in POST requests against the account
  # creation endpoint.
  return false if request.path != Rails.application.routes.url_helpers.accounts_path
  return false if !request.post?

  # Try to reproduce what Rails does under the hood with strong parameters, so
  # that it's as close to the controller action as possible.
  action_dispatch_request = ActionDispatch::Request.new(request.env)
  params = ActionController::Parameters.new(action_dispatch_request.params)

  # If this raises an error then it IS account spam. If not ...
  params.require(:account)

  # ... then it's legit
  false
rescue ActionController::ParameterMissing
  true
end

Blocking such invalid requests was now a matter of calling account_spam? in the blocklist filter:

Rack::Attack.blocklist("bots") do |request|
  Rack::Attack::Fail2Ban.filter("bot:#{request.ip}", maxretry: 1, findtime: 1.minute, bantime: 1.hour) do
    # Block if it's account spam or ...
    account_spam?(request) ||
      # ... a vulnerability scan.
      SUSPICOUS_PATHS.any? { _1.match?(request.path) }
  end
end

Closing Thoughts

I emerged victorious from the battle with malicious actors on the Internet. That victory cost me a few hours spent on trying to understand what’s going on. I won’t call them “wasted”, as it was great education and know-how to apply in other projects.

In the next article, I’m going to cover something more pleasant: using Tailwind CSS to produce HTML + CSS only charts like the ones on the home page or the stats page.

If you'd like to gain early-access to the next article in the series then leave your email below. I'll be covering creating simple charts with HTML and CSS.

Leave your email to receive updates about articles.