Data at Frame.io
At a growing startup like Frame.io, we have a huge volume of data coming in — over 67 million events per month. Efficient analysis is crucial to remain the market leader, and we use a whole host of tools:
- GUIs like Looker, Redash and PSequel
- Internal analytics and query parallelization microservices
- Custom AI slackbots we’ve built for spam/intrusion detection and sales lead spotting
This means that our data warehouse is being constantly hit by tons of queries — both on Cron schedules and from employees exploring our datasets.
Managing this volume of queries can be tough.
- PostgreSQL tables can lock up — trying to swap/delete a table while someone else is querying it, for example, will cause them to lock up. Ideally, analytics tables should only ever be incrementally updated, but sometimes design constraints prevent this.
- SQL GUIs can randomly time-out and its useful to quickly inspect the DB to see if they are still running or not.
- Using a combination of
psqlto run queries in the background requires monitoring. Dima is great for that.
- An accidental SELECT without a LIMIT on a multi-million row table can crash a SQL GUI.
- Sometimes a big query just isn’t what you wanted. Psequel doesn’t have a cancel button, and hosted GUIs like Redash and Looker have Cancel buttons that often become unresponsive for long running queries.
To help deal with this, I built a small command-line tool called dima. This originally stood for “Database Integrity Monitoring & Analysis”, but now its just really “dima”. It’s very straightforward and has turned out to be super useful, so I got the OK to open source it.
Dima is a Python script that basically lets you quickly summarize or inspect the
pg_stat_activity internal table in Postgres via the psycopg2 library. If you want to kill a query it will call
pg_terminate on the PID.
Check out https://github.com/Frameio/homebrew-dima.
pip install dima-db or
brew tap Frameio/homebrew-dima && brew install dima
dima — a summary of running queries
dima show [PID] —inspect a specific running query
dima rm [-f] Some PID or Keyword — kills query/queries