Soren

Universal Postgres Exports

The number of new accounts is on the rise, the platform is generating more revenue than before, and operations wants to get a sense of the numbers. This is a perfect opportunity to break out some SQL and generate a report. Of course the c-levels all crunch their numbers with Excel. You can't simply dump a table of non-portable tabulated data. This calls for the ubiquity of CSV!

Fire up a PostgreSQL REPL connected to your development environment and get to work on a rough signup funnel.

psql widgets_development

Your platform lets account owners create and launch new embeddable widgets. An important measure of success is how many new accounts then go on to create and launch one or more widgets. You write up the funnel bundling account owners into monthly cohorts:

WITH widgeters AS (
  SELECT id,
         date_trunc('month', created_at) AS created_at,
         NULLIF(widgets_count, 0) AS widgets_count,
         (SELECT NULLIF(COUNT(*), 0)
            FROM widgets
            WHERE widgets.account_id = accounts.id
            AND widgets.launched_at IS NOT NULL) AS launched_count
  FROM accounts
  AND created_at > current_timestamp - '1 year'::interval
)

SELECT created_at,
       COUNT(*)              AS signed_up,
       COUNT(widgets_count)  AS made_widget,
       COUNT(launched_count) AS launched_widget
  FROM widgeters
  GROUP BY created_at
  ORDER BY created_at;

That works. The data looks right to you. Now it is time to export it. No problem. Modify the query by wrapping it in a COPY statement:

COPY (
  -- previous query
) TO '/tmp/cohorts.csv' WITH CSV HEADER;

That outputs a perfectly formatted CSV onto the local file system. After taking a quick glance at the output you notice that your development data isn't remotely accurate. It is mangled and out of date. Grabbing data from production would be much more useful. Again, no problem, turn the query into a .sql file that can be run against the remote database.

psql "postgres://username:password@widgets-server/widgets_production" -f cohorts.sql

Hmm, PostgreSQL didn't like that. It informs you that only root users can export to the file system. You would have to scp the file back to your local machine anyhow. Fortunately, the COPY command can also output to STDOUT. Update the output in cohorts.sql:

COPY (
  -- previous query
) TO STDOUT WITH CSV HEADER

Now the results can be piped into a local file directly:

psql "postgres://username:password@widgets-server/widgets_production" \
     -f cohorts.sql \
     > cohorts-production.csv

Export it Again. And Again.

As expected, operations liked the export. Now they want another one, but they want to be able to generate it themselves right from the admin section. Assuming the server is running Ruby (a safe assumption for the time being), we may be tempted to reach for the CSV library. After all, it is right there in the standard library. However, that may require rewriting our perfectly functional query in ActiveRecord. It would also involve pulling all of the data back from the server and instantiating objects for every column and row. There is a better way!

Every database connection adapter has the means to execute a SQL query and read back the results. Ruby's pg adapter is no exception. With a very thin wrapper around a database connection you can generate exports from the same query file right on the server.

First, the wrapper class:

class PostgresCSVWriter
  attr_reader :adapter

  def initialize(adapter = ActiveRecord::Base)
    @adapter = adapter
  end

  def connection
    adapter.connection.instance_variable_get('@connection')
  end

  def write_rows(query, io: '')
    connection.copy_data(build_copy_query(query)) do
      while row = connection.get_copy_data
        io << row
      end
    end
  end

  private

  def build_copy_query(query)
    %(COPY (#{query}) TO STDOUT WITH DELIMITER ',' CSV HEADER)
  end
end

Please note that while the query itself is interpolated, it is not escaped, and is therefore vulnerable to injection attacks. This is not suitable for user generated queries.

With the base writer in place you can now write an exporter that injects an existing .sql query and hands back a string suitable for streaming back to the client.

require 'postgres_csv_writer'

class PostgresCSVExporter
  def export(filename)
    writer.write_rows(query(filename))
  end

  def query(filename)
    IO.read(Rails.root.join('queries', filename))
  end

  private

  def writer
    PostgresCSVWriter.new
  end
end

Exports are now as simple as PostgresCSVExporter.new.export('cohorts.sql'). Every export lives in its own .sql file which can be edited and executed natively. For larger exports, this approach will build up multi-megabyte strings, which is probably undesirable. In those situations you may reach for Tempfile and send the file contents rather than a large string.

Now you can keep all of your queries written in completely portable SQL. The queries can be accessed from any other tech stack without the need for an ORM or an intermediate representation. Let the team know their exports are now just one click away.