Tools To Check Out

This board is meant to be a place to kick any interesting tools I come across as well as any notes related to that tool as I check them out.

SQL Mesh

What Is it? Per there website:

SQLMesh is an open source data transformation framework that brings the best practices of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python. It is created and maintained by Tobiko Data, a company founded by data leaders from Airbnb, Apple, and Netflix.

Why am I interested?

It seems like an interesting alternative to a tool like synapse or Fabric.

Plot Nine

Python Plotting

Satyrn

Satyrn is a Mac based alternative to Jupyter. Some nice features I've found so far are:

  • It is a really clean interface with few distractions
  • They use most of the same keyboard shortcuts as Jupyter
  • They list their keyboard shortcuts right away in a intro notebook
  • The auto-complete is really fast, and the tool seems pretty quick too.

Some things that have been a bit of a challenge:

  • I ended up setting a venv and then getting the path to the bin to get a kernel with some packages (wouldn't mind having a default environment with the option to pip install)
  • Not entirely sure how to use black (set up to the path, but didn't notice the keyboard shortcut).

Things to Try

  • Get a key to ChatGPT and try the built int help

Tauri

Electron with a rust backend

Marimo

Reactive notebook option.

Supabase

postgres database service with auth.

Great Expectations

Data quality validation tool.

Kolo

Invert a trace and get a working integration test in fifteen minutes.

Difftastic

Difftastic is a CLI diff tool that compares files based on their syntax, not line-by-line. Difftastic produces accurate diffs that are easier for humans to read.

Quarto

An open-source scientific and technical publishing system

SQL Flow

SQL visualization tool

FastHTML

Modern web applications in pure Python

LanceDB

LanceDB is an open-source vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale multi-modal data. The core of LanceDB is written in Rust 🦀 and is built on top of Lance, an open-source columnar data format designed for performant ML workloads and fast random access.

Sanity RSS Plugin

Sanity RSS Plugin

Datasette

Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.

SeaweedFS

Potential local S3 option.

Deltabase

Polar + delta lake

Stumpy

Time series analysis

LLMIO

llm io

SQLGlot

SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. It can be used to format SQL or translate between 21 different dialects like DuckDB, Presto / Trino, Spark / Databricks, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.

Marimo

marimo is an open-source reactive notebook for Python — reproducible, git-friendly, executable as a script, and shareable as an app.

Posts

My current favorite image of myself per my 8 yo daughter