active_record, MySQL, and emoji

More and more, people are adopting emoji in their online communications. At TaskRabbit, we noticed that our users are starting to use emoji all over the place, from task descriptions to reviews.

There are some problems when supporting the emoji character set wit our stack, which includes Rails 4.0 and MySQL. The main problem is that MySQL’s utf8 encoding does not actually support multi-byte strings, which emoji relies on. In MySQL 5.5, the utf8mb4 encoding was introduced which allows for Multi-Byte (mb) strings… and therefore emoji would work! The mysql2 gem introduced support for utf8mb4 about a year ago, but only recently did active_record (and rails) add support for this in rails 4.1.

Initially, we decided to ignore all emoji characters, literally stripping them out of strings with our demogi gem (Thanks Pablo!). However, with our new product launch in the UK, we thought it was time to actually address the problem. Here is what we learned:

Migrating MySQL from utf8 to utf8mb4

The good news is that the upgrade path from utf8 to utf8mb4 is easy. As we are adding bytes, the migration is really just a definition change at the table-level. Nothing has to change with your existing data. This is a non-blocking and non-downtime migration. If you are using normal rails migrations, all of your column types for VARCHAR columns will be based on the table’s encoding. Changing the table will change the column type. The bad news is that any text-type (or blob-type) columns will need to be explicitly changed.

Check out the migration steps:

  1. change the DB’s encoding entirely, so new tables will be created in utf8mb4
  2. alter all existing tables
  3. explicitly update text-type columns
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class Utf8mb4 < ActiveRecord::Migration

  UTF8_PAIRS = {
    'users'     => 'notes',
    'comments'  => 'message'
    # ...
  }

  def self.up
    execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET utf8mb4;"

    ActiveRecord::Base.connection.tables.each do |table|
      execute "ALTER TABLE `#{table}` CHARACTER SET = utf8mb4;"
    end

    UTF8_PAIRS.each do |table, col|
      execute "ALTER TABLE `#{table}` CHANGE `#{col}` `#{col}` TEXT  CHARACTER SET utf8mb4  NULL;"
    end

  end

  def self.down
    execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET utf8;"

    ActiveRecord::Base.connection.tables.each do |table|
      execute "ALTER TABLE `#{table}` CHARACTER SET = utf8;"
    end

    UTF8_PAIRS.each do |table, col|
      execute "ALTER TABLE `#{table}` CHANGE `#{col}` `#{col}` TEXT  CHARACTER SET utf8  NULL;"
    end
  end
end

database.yml

The only change here is to change the encoding:

1
2
3
4
5
6
7
8
development:
  adapter: mysql2
  encoding: utf8mb4 # <---
  database: my_db_name
  username: root
  password: my_password
  host: 127.0.0.1
  port: 3306

Index Lengths

The last step here is to worry about index lengths, as mentioned above. If you are on rails 4.1, you have nothing to worry about! The rest of us have a few options:

  1. monkeypatch activerecord
  2. change the index length within MySQL
  3. set the length to 191 within all index migrations

We chose #2 due to the simplicity of the solution. Check the links above for a detailed discussion of the problem.

1
2
3
4
5
6
7
module ActiveRecord
  module ConnectionAdapters
    class AbstractMysqlAdapter
      NATIVE_DATABASE_TYPES[:string] = { :name => "varchar", :limit => 191 }
    end
  end
end

And now you can emoji to your ❤’s content!

Read more at the source

A design for iOS push notifications

At TaskRabbit, we use push notifications heavily in our apps. The first
implementations of push were simple blocks of conditional logic in view
controllers that could respond to notifications. After we started adding many
notification events to the app, we realized pain points of this design and
iterated on how to implement push.

A small APNS payload

As payloads from APNS are limited in size, we use a short string to indicate
what type of event should happen in the app: event_type. For most
notifications, the unique object id of the related model is also included.

The payload is immediately serialized to a TRRemoteNotification – the
application’s representation of the event. We use an enum to represent the
remote event type which is serialized from the event_type string in the
payload. An enum is easy to work with and is defined in the app at build time.
This enum also serves to ensure that is a finite set of event types.

Dispatching notifications via two simple protocols

1
2
3
4
5
6
7
8
@protocol TRUIRemoteNotificationProtocol <NSObject>
- (id)initWithNotification:(TRRemoteNotification *)notification;
@end

@protocol TRUIRemoteNotificationResponderProtocol  <NSObject>
- (BOOL)isFirstResponderForNotification:(TRRemoteNotification *)notification;
- (void)didReceiveNotification:(TRRemoteNotification *)notification;
@end

Notifications in navigation controller based apps

We dispatch notifications immediately after serializing a
TRRemoteNotification, by setting the notification on the navigation
controller.

1
2
3
4
5
6
7
8
9
10
11
@interface UINavigationController (TaskRabbit) <UINavigationControllerDelegate>

- (void)pushNotificationFirstResponder:(id
  <TRUIRemoteNotificationResponderProtocol>)responder;
- (__weak id
  <TRUIRemoteNotificationResponderProtocol>)popNotificationFirstResponder;

- (void)setNotification:(TRRemoteNotification *)notification
  animated:(BOOL)animated;

@end

In order to respond to notifications, an object must be pushed onto the
navigation controller’s responder stack. This responder chain is like
UIResponder, but for remote events.

Similar to UIResponder, we enumerate through the responder chain,
checking if each object can respond to the notification by calling
isFirstResponderForNotification:. If the object does respond, the
notification dispatch process is complete. If not the next object in the
responder chain is given the opportunity to respond. Only 1
TRUIRemoteNotificationResponderProtocol can respond to an instance of a
TRRemoteNotification.

Any object can sign up to respond to notifications on by adding its self to the
responder chain. In our applications, only view controllers implement these
protocols and respond to notifications. viewWillAppear: is the point in the
view controller life-cycle where most view controllers will sign up.

Since a deallocated view controller cannot receive notifications and the
responder chain shouldn’t create a strong reference to the responder: a
NSPointerArray with weak references is the perfect collection to implement
the responder chain. The original implementation used a NSMutableArray as a
data structure, which required each responder to be explicitly removed.

Building the same hierarchy a user would navigate to

If there is no responder for the first notification, the next step is to check
if we need to build a new navigation hierarchy.

We load the notification completely before presenting the new UI stack.

At this point in the notifications life-cycle we need new view controllers to
display. We added a asynchronous factory method on UIViewController to start the
loading process. In the case of the application being active, the user is not
notified until loading has completed.

1
2
3
4
5
6
@interface UIViewController (TRRemoteNotification)

+ (void)viewControllersForNotification:(TRRemoteNotification *)notification
completion:(void (^)(NSArray *viewControllers))completion;

@end

We save an array of UIViewController class names that conform to
TRUIRemoteNotificationProtocol so that we can build an array of view
controllers. Each TRRemoteNotificationEventType has a hierarchy of view
controller classes defined.

Beyond push – deep linking support and URL schemes.

After we implemented this pattern to respond to notifications, we realized we
wanted the same functionality when the app was opened from a URL. Using the
notification protocol to handle URL was obvious because it only requires an
object id and event_type to load the model and present the UI. Since this
design only requires 2 strings, our URL scheme is simple and the deep links are
short.

Conclusion

We handle lots of push notification types as well as URL schemes in our apps.
This design isn’t a magic bullet for handling remote events, but it is easier
to follow and more robust than massive blocks of conditional logic.

Read more at the source

Offshore: Rails Remote Factories

Last year at TaskRabbit, we decided to go headlong into this Service Oriented Architecture thing. We ended up with several Rails and other Ruby apps that loosely depended on each other to work. While this was mostly good from a separation of concerns and organizational perspective, we found effective automated testing to be quite difficult.

Specifically, we have one core platform application whose API is the most used. We also allow other apps to have read-only access to its database. Several of the apps are more or less a client to that server. So while the satellite apps have their own test suite, the integration tests can only be correct if they are exercising the core API.

To handle this use case, we created a gem called Offshore. A normal test suite has factories and/or fixture data and it uses database transactions to reset the data for every test. Offshore brings this the SOA world by providing the ability to use another application’s factories from your test suite as well as handling the rollback between tests. Through the actual exercising of the other application as well as a simplified usage pattern, we found that we trusted our integration tests much more that alternative approaches.

Read more at the source

Testing iOS UI code with Kiwi

Testing iOS UI code with Kiwi

At TaskRabbit, we use automated testing in iOS,
Android, and web projects. When I first was talking to Julian about joining
TaskRabbit, I got super excited when I heard they were testing the iPhone app.
Testing wasn’t so popular in the iOS community at the time (not that this has
changed).

When I first started, the team was using
Cedar as a testing framework. I had used
Kiwi in past projects and was hooked. Kiwi is a treat to work with, is written
purely Objective-C and is built on top of SenTestKit/XCTest – check it out
here. We ended up switching to Kiwi to test
the app and the result has been nothing but positive.

Testing iOS UI code isn’t always so straightforward and requires special
considerations, so I wanted to share some techniques that we are using to make
testing more enjoyable.

Read more at the source

Percona Toolkit Adventures: pt-online-schema-change

Today at work, we had a couple of big migrations we had to run in two of our biggest and most important tables. Both migrations included adding new columns to the tables, which would incur in table locks, which meant incurring in around an hour of downtime (including displaying our maintainance page) while running these migrations after 10 PM.

However, a while back, we had found pt-online-schema-change, a nifty little tool that’s part of the Percona Toolkit that allows you to run these types of migrations without incurring in downtime. Basically, exactly what we needed.

As a cherry-on-top-of-your-pie surprise, it turns out that it’s completely compatible with MySQL, which was perfect for us since our staging boxes run vanilla MySQL (as oposed to our production servers, which use Percona). Another super cool thing about it is that it works out of the box with Percona Cluster, which we’re currently using in production.

Long story short, we figured this would be a great time to test out the tool. So I set out to run both migrations on one of our staging boxes. The command looked something like:

1
pt-online-schema-change -uguy -ppwd --alter "ADD COLUMN new_field INT" D=staging_db1,t=users --execute

After chugging along quite nicely for a while, it was done. The new field was added to the users table without locking it whatsoever and we were all happy enough to try it on production at night.

As these things usually go, from staging to production, things didn’t work exactly the same. Basically, the box was setup a bit different and when trying to run the same command, it would error out with a somewhat cryptic error message:

1
2
3
pt-online-schema-change -uguy -ppwd --alter "ADD COLUMN new_field INT" D=staging_db1,t=users --execute

Can't use an undefined value as an ARRAY reference at /usr/bin/pt-online-schema-change line 7085.

I tried looking for the error online, and most, if not all, of the mentions were related to a changelog entry about Percona Toolkit a while back. After making sure we were on the latest version (we were), we continued debugging.

Debugging output to the rescue

What ended up providing much needed clarity to the issue, was running the command the same as before but with the PTDEBUG flag turned on to provide verbose debugging output to STDERR, as recommended in the docs.

1
2
3
4
5
PTDEBUG=1 pt-online-schema-change -uguy -ppwd --alter "ADD COLUMN new_field INT" D=staging_db1,t=users --execute
# ...
# VersionCheck:7008 15008 Version check file percona-version-check in /tmp
# VersionCheck:7082 15008 Version check failed: Cannot open /tmp/percona-version-check: Permission denied at /usr/bin/pt-online-schema-change line 7120.
#

So there we have it. Turns out the problem all along was a simple permissions issue with our maintenance user. After fixing this, we ran the migrations without problem.

Read more at the source

Emojis break MySQL with UTF-8 encoding

We’ve been running into some Tasks that have emojis as part of their descriptions, which turns out is very problematic for our UTF-8 MySQL databases.

Evan started looking for a solution and found the answer is in using the UTF8MB4 encoding. Sadly, this encoding is not entirely supported by Rails 3.

After a bit of thinking, we decided that for now the best approach would be to write a little initializer that rescues from the specific exception that’s raised when you try to save a text with emoji and replace the offending characters with blank spaces. It kinda sucks, but till proper support is introduced for Rails 3, we saw no way around it.

For the benefit of having an easy-to-reuse piece of code, we’ve abstracted that initializer into a gem called demoji. The code is pretty straighforward. You can check it out on the project’s github page!

Read more at the source

Rails 4 Engines

At TaskRabbit, we have gone through a few iterations on how we make our app(s). In the beginning, there was the monolithic Rails app in the standard way with 100+ models and their many corresponding controllers and views. Then we moved to several apps with their own logic and often using the big one via API. Our newest project is a single “app” made up of several Rails engines. We have found that this strikes a great balance between the (initial) straightforwardness of the single Rails app and the modularity of the more service-oriented architecture.

We’ve talked about this approach with a few people and they often ask very specific questions about the tactics used to make this happen, so let’s go through it here and via a sample application.

Rails Engines

Rails Engines is basically a whole Rails app that lives in the container of another one. Put another way, as the docs note: an app itself is basically just an engine at the root level. Over the years, we’ve seen sen engines as parts of gems such as devise or rails_admin. These example show the power of engines by providing a large set of relatively self-contained functionality “mounted” into an app.

Read more at the source

ElasticDump

Intro

At TaskRabbit, we use ElasticSearch for a number of things (which include search of course). In our development, we follow the normal pattern of having a few distinct environments which we use to build and test our code. The ‘acceptance’ environment is supposed to be a mirror of production, including having a copy of its data. However, we could not find a good tool to help us copy our Elastic Search indices… so we made one!

Use

elasticdump works by sending an input to an output. Both can be either an elasticsearch URL or a File.

Read more at the source

Resque Bus

At TaskRabbit, we are using Resque to do our background job processing. We’ve also gone one step further and used Redis and Resque to create an asynchronous message bus system that we call Resque Bus.

Redis / Resque

Redis is a single-threaded in-memory key/value store similar to memcached. Redis has other features like pub/sub and more advanced data structures, but the key feature that makes it an ideal storage engine for a queue and a message bus is that is can perform atomic operations. Atomic operations are the kind of operations you can expect to do on in-process data (like Array.pop or Array.splice) but in way that keeps the data sane for everyone connected to the database.

Resque is a background queue built on top of Redis. There seems to be other options out there these days, but we are pretty happy with Resque and associated tools/ecosystem. There is plenty of code in the resque codebase, but it all comes down to inserting json the queue, popping, and executing code with that as an input.

Resque Bus

Resque Bus uses Resque to create an asynchronous message bus system. As we have created more applications with interdependencies, we have found it helpful to create something like this to loosely couple the worlds. There are several other possible solutions to this problem, but I really felt that it was important to use something that our team understood well for this piece of infrastructure that we could easily modify and maintain.

Application A publishes an event

Something happens in your application and you want to let the world know. In this case, you publish an event.

1
2
3
4
5
# business logic
ResqueBus.publish("user_created", "id" => 42, "first_name" => "John", "last_name" => "Smith")

# or do it later
ResqueBus.publish_at(1.hour.from_now, "user_created", "id" => 42, "first_name" => "John", "last_name" => "Smith")

Application B subscribes to events

If the same or different application is interested when an event happens, it subscribes to it by name.

1
2
3
4
5
6
7
# initializer
ResqueBus.dispatch("app_b") do
  subscribe "user_created" do |attributes|
    # business logic
    NameCount.find_or_create_by_name(attributes["last_name"]).increment!
  end
end

How it works

The following is how this workflow is accomplished:

  • Application B subscribes to events (puts a hash in Redis saying what it is interested in)
  • Application A publishes the event (puts published hash as args in a Resque queue called resquebus_incoming with a class of Driver)
  • The Driver copies the event hash to 0 to N application queues based on subscriptions (arg hash now in app_b_default queue with a class of Rider)
  • The Rider in Application B executes the block given in the subscription

Redis Bus Data Flow

Dedicated Setup

Each app needs to tell it’s subscriptions to Redis

$ rake resquebus:subscribe

The incoming queue needs to be processed on a dedicated or all the app servers.

$ rake resquebus:driver resque:work

The subscription block is run inside a Resque worker which needs to be started for each app.

$ rake resquebus:setup resque:work

If you want retry to work for subscribing app or you are using hte delayed publish_at syntax, you should run resque-scheduler

$ rake resque:scheduler

Combined Setup

This is the most dedicated way to run it, but all that resquebus:driver and resquebus:setup do is set the QUEUES environment variable. So you could run:

$ rake resque:work QUEUES=*

That would work only if you have a single app. While I believe this paradigm still adds value for a single app, it’s likely you have more than one app and the most important rule is to not allow Application C to process Application B’s queue, so that command would likely look more like this:

$ rake resque:work QUEUES=app_b_default,resquebus_incoming

It’s best practice to set your queue names, anyway. If you use resque-bus in the same Redis db as your “normal” Resque queues, then your full command set would probably look something like this:

$ rake resquebus:subscribe
$ rake resque:work QUEUES=high,app_b_default,medium,resquebus_incoming,low
$ rake resque:scheduler

It’s Just Resque

The above illustrates the primary reason that I like this system. It’s just Resque. While this may not be the most performant way to create a message bus, there are a number of good reasons to do so:

  • Nothing new to monitor or deploy
  • If used in a combined setup, you have nothing new to run
  • If it stops processing a queue (downtime, during deploy process), it catches back up easily
  • I understand what is going on (and resque has a simple data model in general)
  • It’s portable. Resque has been re-implemented in a number of languages beyond ruby (we use a node.js rider for example)
  • Many plugins already exist to add in extra capabilities (stats recording for example)

I feel that the “I understand point…” sounds a little like NIH, but it’s just really important to me to fully know where this critical data lives.

Of course, because it’s just Resque, there are known issues to work through:

  • It’s relatively slow when compared with other systems. We’ve experimented with Node and Sidekiq to do the Driver role if this becomes an issue.
  • Redis does not have a good failover system so this adds a single point of failure to the system. We’ve been working on various techniques to mitigate this risk including replication and (failover tools)[https://github.com/twitter/twemproxy].

Use Cases

The effect on our apps from other apps publishing and subscribing ends up being one of focus. A request comes in to the web server and that code is in charge of accomplishing the primary mission, for example signing up a user. When this is finished, it publishes an event called user_created just in case other apps care.

Sometimes one or several apps do care. In the signup case, our marketing app subscribes and starts a campaign to onboard that user as effectively as it knows how starting with a welcome email. Our analytics app subscribes and lets various external systems like Mixpanel know. Our admin search tool subscribes to get that user in the index. And so on.

Most of our data goes through certain states. For example, a Task goes from assigned to completed. Overall, we have found that publishing when the states changes is just about always the right thing to do. Some of those events have many subscribers. Many events are completely ignored (at the moment) and that is fine too.

A few types of apps have evolved within this paradigm:

  • Rails apps that subscribe and publish in order to achieve their goals
  • Bus apps that are small and data driven that have no UI
  • Logging and analytics apps that subscribe to record many events

Rails app communication

When a Task is posted on the site, the app publishes a task_opened event. This is a very important event and there are lots of subscribers. One of them is our Task-browsing app that helps TaskRabbits find the right job for them. It has its own codebase and storage designed to optimize this particular experience. When it receives the event about the new Task, it does all the calculations about who is right for the job and stores them in the way it wants to optimize the browsing. It is also subscribed to events that would indicate that the Task is now longer to be browsed by TaskRabbits. In these cases, it removes objects related to that Task from storage.

The separation described here between the two systems involved (Task posting and browsing) has had a few effects.

  • Local simplicity has increased. Each system does what it does with simpler code than if it was all combined into the same thing.
  • Local specialization has increased. For example, now that the browsing experience is separate in code, I feel better in choosing the right storage solution for that experience. When in one system, it feels like “yet another thing” added to something that’s already complicated.
  • Global complexity has increased. This separation has a cost and it is in the number of moving pieces. More moving pieces adds maintenance costs through mysterious bugs, time in runtime execution, and overall cognitive load on the developer. It’s case by case, but we believe it can be worth it.

Finally, note that this Rails app also publishes events about the new TaskRabbits that are relevant to the Task.

Bus Apps

Specifically, the browsing application publishes N events, each about a notification that should occur because of the new Task. We have a class of application which has no UI and just listens on the bus. We call the app listening for notification events Switchboard. Switchboard is an example of what I called a “bus app.” A bus app exists to subscribe to various events and take action based on the data. In this case, Switchboard receives an event that indicates that a text message needs to be sent, so it does so. Or it can look at the user’s preference and decide not send it.

With this approach, Switchboard is able to accomplish a few things effectively:

  • It is the only app that knows our Twilio credentials or how to format the HTTP call
  • It is the only one that knows that we even use Twilio or what phone number(s) to send from
  • It is the only app that decides what phone number to send to and/or how to look up a user’s preferences
  • It can have a drastically reduced memory profile than a normal Rails app in order to be able to process more effectively.
  • It provides a centralized choke point for all outgoing communications, making something like a staging whitelist easy to implement

In effect, ResqueBus and Switchboard create an asynchronous API. Simply knowing the terms of the API (what to publish) provides several benefits to the consuming apps:

  • They don’t have to know how to send text messages
  • They don’t have to know how to look up a user’s preferences or even phone number
  • They don’t have to change anything if we decide to send text messages differently
  • They can focus on the content of the message only
  • They will not be help up or crash if Twilio is having a problem of some sort

Loggers

As noted, all of these benefits of decentralization come at the cost of global complexity. It’s important to choose such architectural areas carefully and clearly this approach is one that we’ve fully embraced. The addition of these “additional” moving pieces requires creation of new tools to mitigate the operational and cognitive overhead that they add. A good example that I read about recently was the ability Twitter has to trace a tweet through the whole lifecycle.

At TaskRabbit, the equivalent is an app called Metrics that subscribes to every single event. Case by case, the Metrics subscription adds some data to assist in querying later and stores each event. We store events in log files, and optionally, elastic search. When combined with unique ids for each event that subscriptions can chain along if they republish, this provides the capability to trace any logical event through the system.

That was the original goal of the system, but it somewhat accidentally had several effects.

  • Again, the ability to trace a logical event throughout decoupled systems
  • Centralized logging capability a la SumoLogic for free (any app can publish random stuff to bus)
  • With minor denormalization and well-crafted queries, realtime business dashboards and metrics a la Mixpanel or Google Analytics

Subscriptions

There are a few other ways to subscribe to receive events.

Any Attributes

The first version of Resque Bus only allowed subscribing via the event type as show above. While I found this covered the majority of use cases and was the easiest to understand, we found ourselves subscribing to events and then throwing it away if other attributes didn’t line up quite right. For example:

1
2
3
4
5
subscribe "task_changed" do |attributes|
  if attributes["state"] == 'opened'
    TaskIndex.write(attributes["id"])
  end
end

While this is fine, something didn’t sit quite right. It adds unnecessary load to the system that could have been avoided at the Driver level. The biggest realization is that bus_event_type is no different than any other attribute in the hash and doesn’t deserver to be treated as such.

In the current version of Resque Bis, this code is now:

subscribe “any_id_i_want”, “bus_event_type” => “task_changed”, “state” => “opened” do |attributes|
TaskIndex.write(attributes[“id”])
end

This ensures it never even makes it to this queue unless all of the attributes match. I felt it was important to keep the simple case simple (so it’s still possible), but in the implementation the first subscription is equivalent to this:

1
2
3
4
5
subscribe "task_changed", "bus_event_type" => "task_changed" do |attributes|
  if attributes["state"] == 'opened'
    TaskIndex.write(attributes["id"])
  end
end

Subscriber Mixin

It feels really powerful and magical to put code like this in a DSL in your initializer or other setup code. However, when we started creating apps that had many subscriptions, it got to be a little overwhelming. For this we created an Object mixin for subscription.

1
2
3
4
5
6
7
8
9
10
11
12
13
class TaskChangesSubscriber
  include ResqueBus::Subscriber
  subscribe :task_changed
  subscribe :changed_when_opened, "bus_event_type" => "task_changed", "state" => "opened"

  def task_changed(attributes)
    # gets called for all task changes
  end

  def changed_when_opened
    # only gets called when state == "opened"
  end
end

This really cleaned up subscription-heavy apps.

Note: This subscribes when this class is loaded, so it needs to be in your load or otherwise referenced/required during app initialization to work properly.

More to come

If people seem to like this approach and gem, we have lots of approaches and tools built on top of it that I’d be excited to make available. Let us know on Github that you like it by watching, starring, or creating issues with questions, etc.

Read more at the source
close