Background file uploads to S3 using Shrine

Oleksii Korondevych - 21.07.2017
shrine, file uploads, rails


An application of our customer was completely paralyzed on February 28 when AWS S3 service had a major outage because the application’s key features rely on upload of an application-generated file to AWS S3. To improve the customer experience we decided to serve files from cache (DB) while the files are being uploaded to S3 in a background job.

We wrote a module that allows any ActiveRecord model to upload a file to the DB cache first and then it uploaded to S3 in the background. We’ve added a service that provides either a relative link to an endpoint that serves a file from the database if it wasn’t uploaded or S3 URL.

We’ve decided to replace Paperclip by Shrine because it can store files in the database out of the box first of all and was designed to work in the background in mind and also it is more flexible and has rich plugin ecosystem.

What do we need:

  1. Write migrations: one for ActiveRecord model that will add an <attachment_name>_data and uploaded columns, and another for uploader’s cache storage.
  2. Set up Shrine storages and create an uploader class, our attachments will be processed here.
  3. Add Job that will upload a file to S3 and bind current record with binary_file where we store our cache.
  4. Add service class that will decide from which storage fetch needed file, either from S3 if it was uploaded, or from the cache.
  5. Write a class-like module that will combine abilities of the uploader and the file retrieving service. The Module we can include in any model that deal with upload process, by passing a name of attachment attribute as an argument.

Write the migrations

Add a Migration for a model:

class AddAttributesToTableName < ActiveRecord::Migration
  def change
    change_table :table_name do |t|
      t.text :<attribute_name>_data
      t.boolean :uploaded, default: false
    end
  end
end

Add a migration for the storage:

class CreateBinaryFiles < ActiveRecord::Migration
  def change
    create_table :binary_files do |t|
      t.binary :content
      t.json :metadata
      t.references :attachable, polymorphic: true, index: true
      t.date :valid_till
    end
  end
end

Set up Shrine and implement a behavior of the uploader

After shrine has been installed and all needed plugins were added, we need to specify the storages.

# config/initializers/shrine.rb
def storage_cache
  Shrine::Storage::Sql.new(database: db_connect, table: :binary_files)
end

def storage_aws
  Shrine::Storage::S3.new(**s3_options)
end

Shrine uploads a file in two stages: cache and store. On the first stage by assigning a file to model it is being synchronously stored to DB ( We’ve chosen DB as cache because our app runs on few servers and we need one place for temporarily stored files ). And after validation and processing in our uploader, the second stage starts and the file is being promoted in background job to S3 storage.

# app/uploader/our_uploader.rb
class OurUploader < Shrine
  plugin :backgrounding

  self.storages = { cache: storage_cache, store: storage_store }

  # this job starts after file has been cached.
  Attacher.promote { |data| OurUploadingJob.perform_later(data) }
end

Add Job to upload files in background

# app/jobs/our_uploading_job.rb
class OurUploadingJob < ActiveJob::Base
  def perform(data)
    attacher = OurUploader::Attacher.load(data)
    record   = attacher.record
    cached   = attacher.get
    pdf_id   = cached.id.split('.').first.to_i
    binary   = BinaryFile.find(pdf_id)
    record.binary_file = binary # Here we bind current record with 'binary_files' to have a fallback mechanism.
    record.save

    promoted = attacher.promote(cached, action: :store)

    if promoted
      record.update_column(:uploaded, true) # Set 'uploaded' as true if file was successfully uploaded.
      binary.update_column(:valid_till, 30.days.from_now) # Binary file will be removed from db after 30 days of uploading.
    end
  end
end

Add file retrieving service

# app/services/some_get_attachment.rb
class SomeGetAttachment < ActiveInteraction::Base
  def execute
    if document.uploaded?
      fetch_from_s3
    else
      fetch_from_fallback_storage
    end
  end

  private

  def fetch_from_fallback_storage
    if document.binary_file
      fetch_from_binary
    elsif attacher.cached?
      fetch_from_cache
    end
  end
end

Write the Attacher module

# app/models/concerns/our_attacher
class OurAttacher < Module
  module InstanceMethods

    # File retriving service provides link on uploaded file.
    def readable_url
      # Ask SomeGetAttachment about file
    end

    def downloadable_url
      # Ask SomeGetAttachment about file
    end
  end

  # Here we use module_eval method to have the ability to pass any attribute name when we include the uploader
  # and also add the association with `binary_files` to the model.
  def initialize(column_name)
    @column_name = column_name

    module_eval <<-RUBY, __FILE__, __LINE__ + 1
      def self.included(model)
        model.include InstanceMethods
        model.include OurUploader[@column_name]

        model.has_one :binary_file, as: :attachable
      end
    RUBY
  end
end

And include OurAttacher module to some model.

# app/models/some_model.rb
class SomeModel < AcriveRecord::Base
  include OurAttacher.new(:<attribute_name>)
end

Conclusion

As result, we have the stable web application with universal upload/retrieved mechanism that can be applied to any ActiveRecord model. This solution can use as an approach to any application thanks to Shrine and its modularity nature since it supplies a lot of plugins and allows to add plugins making by yourself, the process of uploading become much easier and flexible.


comments powered by Disqus