<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Posts on Sam&#39;s Blog</title>
    <link>/posts/</link>
    <description>Recent content in Posts on Sam&#39;s Blog</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <lastBuildDate>Sat, 04 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="/posts/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Review of Hex</title>
      <link>/posts/hex_review/</link>
      <pubDate>Sat, 04 Apr 2026 00:00:00 +0000</pubDate>
      
      <guid>/posts/hex_review/</guid>
      <description>&lt;h2 id=&#34;hex-review&#34;&gt;Hex Review&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s Easter Sunday. After a long day of chocolate mogging and stealing eggs from small children, I&amp;rsquo;ve decided to check out &lt;a href=&#34;https://hex.tech/&#34;&gt;Hex&lt;/a&gt; which calls itself the &amp;ldquo;The AI Analytics Platform for your whole team&amp;rdquo;. I&amp;rsquo;m going to toss a small csv into it, if it does well I might hit it with a 20GB Parquet file.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not going to connect a data warehouse to it (This is wrong, I ended up connecting to &lt;a href=&#34;https://motherduck.com/&#34;&gt;MotherDuck&lt;/a&gt;), I&amp;rsquo;m going to treat this as the classic &amp;ldquo;There&amp;rsquo;s a dashboard, but I need to download the underlying data and re-manipulate&amp;rdquo;&lt;/p&gt;</description>
      <content>&lt;h2 id=&#34;hex-review&#34;&gt;Hex Review&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s Easter Sunday. After a long day of chocolate mogging and stealing eggs from small children, I&amp;rsquo;ve decided to check out &lt;a href=&#34;https://hex.tech/&#34;&gt;Hex&lt;/a&gt; which calls itself the &amp;ldquo;The AI Analytics Platform for your whole team&amp;rdquo;. I&amp;rsquo;m going to toss a small csv into it, if it does well I might hit it with a 20GB Parquet file.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not going to connect a data warehouse to it (This is wrong, I ended up connecting to &lt;a href=&#34;https://motherduck.com/&#34;&gt;MotherDuck&lt;/a&gt;), I&amp;rsquo;m going to treat this as the classic &amp;ldquo;There&amp;rsquo;s a dashboard, but I need to download the underlying data and re-manipulate&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I signed up for a free trial and started off with a prompt and some data.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I have not read a single guide, looked at any template or read any docs&lt;/li&gt;
&lt;li&gt;I have no idea how Hex works on the backend&lt;/li&gt;
&lt;li&gt;I&amp;rsquo;m just going to figure it out as I go and bumble my way through&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;the-data&#34;&gt;The data&lt;/h3&gt;
&lt;p&gt;I gave it a ~740 line csv with historical March Madness data&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Season&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Round&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Team&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Odds&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Result&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;2026&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;TCU&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;120&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;2026&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Ohio State&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;-142&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;0&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;2026&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Troy&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;650&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;0&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;2026&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Nebraska&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;-1000&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;1&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;the-prompt&#34;&gt;The Prompt&lt;/h3&gt;
&lt;p&gt;I&amp;rsquo;ve added the prompt below but the TLDR is we want to evaluate the &amp;ldquo;accuracy&amp;rdquo; of pre-game odds for March Madness from a CSV containing 5 years of historical data. The prompt isn&amp;rsquo;t great, and I didn&amp;rsquo;t even provide extra contents on what I was working on.&lt;/p&gt;
&lt;details class=&#34;spoiler&#34;&gt;
  &lt;summary&gt;Click to view the entire prompt word for word&lt;/summary&gt;
  &lt;div class=&#34;spoiler-content&#34;&gt;
    &lt;p&gt;I have captured historical odds on NCAA march madness tournaments. The CSV contains data with the year, round, team, odds (in american odds) and result (1 for win 0 for loss). The data is not perfectly clean, some of the odds formatting may be wrong or missing, not all years have the correct number of games.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d like to visualize the accuracy of these predictions on a discrete vent over the years have been. Use something like brier scores and/or log loss to do this. If you have a better way feel free to do that as well. Split this by round as well as there are many blowouts or &amp;ldquo;easy&amp;rdquo; to call games in the beginning of the tournament.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d also like to measure the competitiveness of these games over the years. Have the odds gotten closer together on avg over the years, how is this changes if we look by round?&lt;/p&gt;

  &lt;/div&gt;
&lt;/details&gt;
&lt;h3 id=&#34;dropping-in&#34;&gt;Dropping in&lt;/h3&gt;
&lt;p&gt;After issuing the prompt and attaching the csv, I immediately found myself in a &amp;ldquo;project&amp;rdquo; or, more descriptively, a &amp;ldquo;notebook style environment&amp;rdquo;. I&amp;rsquo;m prompted to give my project a name and watch as some LLM gets to work on the right side of my screen. I can give the project a status and category, which is nice as I could see these things getting unwieldy pretty fast if you had 20-30 people doing &amp;ldquo;analysis&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/Initial_Drop_IN.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;the-first-cell&#34;&gt;The first cell&lt;/h3&gt;
&lt;p&gt;There&amp;rsquo;s a collapsible cell with a &lt;code&gt;SELECT *&lt;/code&gt; of my data displayed as a table. The column structure is there, and I can actually create a new column with a formula that can reference existing columns if I want. If you&amp;rsquo;ve read my blog you&amp;rsquo;ll know I don&amp;rsquo;t like &lt;code&gt;SELECT *&lt;/code&gt;, but this doesn&amp;rsquo;t seem to be auto-materialized for all new projects and this is a csv so no harm no foul.&lt;/p&gt;
&lt;p&gt;I can re-write the SQL in the cell to produce a new data-frame, but I can also manually filter it via the UI. Applying the filter via the UI updates the &amp;ldquo;relation&amp;rdquo; (I don&amp;rsquo;t really know what to call this yet), and it also looks like it updates visualizations/cells that depend on the relation (The LLM seems has been hacking away below).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/Add_comment.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;p&gt;Importantly, comments can be added, responded, reacted, and resolved in a cell. Maybe more importantly? Cells can be triggered in DAG fashion with downstream cells taking dependencies. This is good. It&amp;rsquo;s kind of feeling like &lt;a href=&#34;https://marimo.io/&#34;&gt;Marimo&lt;/a&gt; which was recently acquired by &lt;a href=&#34;https://marimo.io/blog/joining-coreweave&#34;&gt;CoreWeave&lt;/a&gt;. Although Hex is not open source, and you don&amp;rsquo;t really own your notebooks they live in cloud land&amp;hellip; this is actually wrong, you can &lt;a href=&#34;https://learn.hex.tech/docs/explore-data/projects/import-export&#34;&gt;import/export notebooks&lt;/a&gt; as &lt;code&gt;.ipynb&lt;/code&gt; or their Hex &lt;code&gt;yaml&lt;/code&gt; situation.&lt;/p&gt;
&lt;p&gt;Speaking of DAGs there&amp;rsquo;s this graph view which is nice. the graph shows me that Hex has decided to &lt;code&gt;SELECT *&lt;/code&gt; from my csv, written some pandas transformations, and then created three charts. I can&amp;rsquo;t figure out how to pop the graph view out (Full screen) or change the size&amp;hellip; thus the blurry image. In general, every square inch of my screen is packed with features.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/Graph_View.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;the-llm--context&#34;&gt;The LLM &amp;amp; Context&lt;/h3&gt;
&lt;p&gt;Moving over to the right pane there&amp;rsquo;s a chat window with an LLM. The chat window can host multiple threads with conversations &amp;gt;90 days being deleted. Unlike the graph view, I can make this pane basically eat the entire screen.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/LLM_Conversation.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;p&gt;I quickly find out who&amp;rsquo;s baking this cookie (claude), but it won&amp;rsquo;t share system prompts/instructions. After a bit of back and forth, Claude tells me I should be using Guides which exists inside &amp;ldquo;context studio&amp;rdquo;, which seems to be exactly how context is managed in Hex. This made me a little nervous at first as it looked as though you&amp;rsquo;d need to embed/maintain context in Hex, but they allow connections from outside sources (i.e. &lt;a href=&#34;https://learn.hex.tech/docs/agent-management/context-management/guides#programmatically-upload-guides-in-ci&#34;&gt;context in github updated via github actions&lt;/a&gt;) If relevant context is haphazardly stored in a bunch of people&amp;rsquo;s brains this is still of little use though.&lt;/p&gt;
&lt;h3 id=&#34;but-wtf-is-the-compute-here&#34;&gt;But wtf is the compute here&lt;/h3&gt;
&lt;p&gt;So now I&amp;rsquo;m curious how this SQL is getting executed. I ask in my little Claude thread and the weights and biases tell me &amp;ldquo;it&amp;rsquo;s querying the uploaded OddsMM.csv file directly using the embedded DuckDB engine&amp;rdquo;. DuckDB&amp;hellip; tight. I can confirm this going into the &lt;a href=&#34;https://learn.hex.tech/docs/explore-data/cells/sql-cells/sql-cells-introduction&#34;&gt;SQL Cells Introduction&lt;/a&gt;. It also seems that you can &lt;a href=&#34;https://learn.hex.tech/docs/explore-data/cells/using-jinja&#34;&gt;parameterize SQL&lt;/a&gt; using Jinja which is a huge win in my book. It also looks like it integrates well with &lt;a href=&#34;https://learn.hex.tech/docs/connect-to-data/data-connections/dbt-integration&#34;&gt;dbt Cloud&lt;/a&gt; which is solid. DBT or something like it is &lt;strong&gt;pretty simply&lt;/strong&gt; how modern data teams should be authoring transformations, there&amp;rsquo;s just not much of a question at this point. I wouldn&amp;rsquo;t want to be orchestrating transforms inside of Hex though. I&amp;rsquo;d be doing this in a data warehouse, orchestrating with Dagster and managing transformations with DBT and only after that connecting to Hex.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/The_Env.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;p&gt;Opening the &amp;ldquo;Environment&amp;rdquo; on the left tool box shows I&amp;rsquo;m using a Medium instance with 8GB of memory and 4CPUs. They have a beefy 128GB instance, but I&amp;rsquo;m getting the feeling that if you need this inside of Hex, you may also need a better data model (Yes, dude who&amp;rsquo;s visualizing some ridiculous telematics data, feel free to rip into me for that comment). I&amp;rsquo;ve also got access to python 3.9 -&amp;gt; 3.12 and a bunch of python packages come pre-installed (no &lt;a href=&#34;https://pola.rs/&#34;&gt;polars&lt;/a&gt;&amp;hellip; which is big sad), but it seems you can just add to your venv with uv.&lt;/p&gt;
&lt;h3 id=&#34;other-staff-in-the-left-most-toolbox&#34;&gt;Other staff in the left most toolbox&lt;/h3&gt;
&lt;p&gt;There&amp;rsquo;s a full revision history with version checkpoints and a large space dedicated to a search feature which seems to be able to &lt;code&gt;grep&lt;/code&gt; through your current workspace. The variables section stores all the built in variables as well as anything we create in our python code and dataframes. There&amp;rsquo;s also a data browser which acts as a functional data catalogue for assets inside the workspace. Opening up the column types inferred from my csv: &lt;code&gt;Team&lt;/code&gt; &amp;amp; &lt;code&gt;Odds&lt;/code&gt; are labelled as having type &amp;ldquo;object&amp;rdquo;&amp;hellip; which is a bit weird, but I guess they are strings&amp;hellip; so arrays&amp;hellip; so&amp;hellip; objects? Idk, I&amp;rsquo;m stretching here for why that doesn&amp;rsquo;t say &amp;ldquo;varchar()&amp;rdquo;.&lt;/p&gt;
&lt;h3 id=&#34;back-in-the-middle&#34;&gt;Back in the middle&lt;/h3&gt;
&lt;p&gt;Coming back to center stage with our data/assets, there&amp;rsquo;s a nice little chart telling us what is taking the longest inside of our &amp;ldquo;DAG&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/Gantt_Chart.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;p&gt;Cool feature to have rather than setting up timing, &lt;code&gt;%%time&lt;/code&gt; on some sprawling notebook.&lt;/p&gt;
&lt;p&gt;Continuing on, Claude has written a little pandas to create some new data-frames based on the aggregations I requested. These data-frames are available for view in the data browser. These have become materialized relations (I&amp;rsquo;m pretty sure) that I can query with SQL in a new cell and are also referenced in the visualizations.&lt;/p&gt;
&lt;h3 id=&#34;visualizations&#34;&gt;Visualizations&lt;/h3&gt;
&lt;p&gt;Speaking of visualizations, Hex seems to be using &lt;a href=&#34;https://hex.tech/blog/vegafusion/&#34;&gt;Vega&lt;/a&gt; after they performed some sort of acqui-hire of the VegaFusion maintainer. The visualizations seem fine and responsive. There&amp;rsquo;s a suite of UI tools to edit them a bit (both style and substance). You get access to the generated SQL, which seems to build the relation required for each one.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/Accuracy_Relation.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;p&gt;From the screenshot you can see &lt;code&gt;accuracy_by_season_round&lt;/code&gt; is referenced which was created by some pandas in the cell above and exposed in the data browser. I&amp;rsquo;d like to have full control of my charts as code&amp;hellip; and it seems like I can easily do that. I created a cell that references that same data-frame, but creates a basic chart with &lt;code&gt;plotly&lt;/code&gt; and it works.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/Plotly.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;make-an-app&#34;&gt;Make an&amp;hellip; App?&lt;/h3&gt;
&lt;p&gt;So it seems like the main other thing you can do is switch out of notebook mode and make an &amp;ldquo;app&amp;rdquo;. I turned my analysis into an &lt;strong&gt;&lt;a href=&#34;https://app.hex.tech/019d6027-9e08-711c-9103-96dd87d47720/app/March-Madness-Prediction-Analysis-032ucMa6UHL5XP99tslaMb/latest&#34;&gt;app&lt;/a&gt;&lt;/strong&gt; (If you read this a while from now this will most likely not be available&amp;hellip; sorry not sorry) which seems to expose the visualizations at an endpoint, and they can refresh on a schedule. You can export it, take snapshots, turn it into a presentation. So the apps are basically interactive dashboards that you might expose to an end user rather than tossing them a notebook and saying &amp;ldquo;run this and look at cell 8&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;These apps can be added to a &amp;ldquo;library&amp;rdquo; that contains published knowledge that can be &amp;ldquo;&lt;a href=&#34;https://learn.hex.tech/docs/organize-content/statuses-categories#endorsed-statuses&#34;&gt;endorsed&lt;/a&gt;&amp;rdquo;. I don&amp;rsquo;t know how this solves for people just endorsing a bunch of stuff. I don&amp;rsquo;t think there&amp;rsquo;s a re-endorsement system where you have to be like &amp;ldquo;yeah this is still good&amp;rdquo;&amp;hellip; because all of this manual tagging typically loses meaning. It&amp;rsquo;s like people flagging workloads as &amp;ldquo;business critical&amp;rdquo; to get access to resources faster&amp;hellip; it&amp;rsquo;s a short time before everything is &amp;ldquo;business critical&amp;rdquo;. Some kind of continuous mechanism to confirm endorsement might be nice (Or just annoying enough that people would disregard it&amp;hellip; its a hard problem admittedly).&lt;/p&gt;
&lt;h3 id=&#34;competitors&#34;&gt;Competitors&lt;/h3&gt;
&lt;p&gt;It looks like &lt;a href=&#34;https://deepnote.com/&#34;&gt;DeepNote&lt;/a&gt; offers something similar, but I&amp;rsquo;ve never used it. It seems like every data warehouse in the sky has something like this. &lt;a href=&#34;https://posthog.com/&#34;&gt;Posthog&amp;rsquo;s&lt;/a&gt; platform is similar but more focused on web analytics. &lt;a href=&#34;https://clickhouse.com/&#34;&gt;ClickHouse&amp;rsquo;s&lt;/a&gt; cloud dw offers a SQL notebook and dashboard experience. &lt;a href=&#34;https://github.com/marimo-team/marimo&#34;&gt;Marimo&lt;/a&gt; is a great notebook based analytics tool that&amp;rsquo;s open source (recently CoreWeave is in the picture).&lt;/p&gt;
&lt;p&gt;This is in &lt;strong&gt;NO&lt;/strong&gt; way an exhaustive list of the space, but I have to say&amp;hellip; Hex is very complete. They are clearly thinking hard about how to nail &amp;ldquo;last mile&amp;rdquo; analytics and make integration upstream simple. If I was sitting in a small shop and we needed analysts/data science to produce some standard reports, version them over time, and sprinkle LLMs into the workflow I&amp;rsquo;m probably reaching for Hex.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m seriously trying to figure out the value proposition of something like Looker, PowerBI, Tableau, etc&amp;hellip; as opposed to Hex.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/Tableau.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;p&gt;People either play around with the filters a bit or they actually just want the underlying data for the dashboard&amp;hellip; which Hex quickly provides access to and lineage for.&lt;/p&gt;
&lt;h3 id=&#34;increasing-the-data-size&#34;&gt;Increasing the data size&lt;/h3&gt;
&lt;p&gt;I had some larger parquet files from my &lt;a href=&#34;https://substack.com/home/post/p-190557745&#34;&gt;Prediction Markets&lt;/a&gt; article that I thought it&amp;rsquo;s be a good test to see if Hex could quickly reproduce a few of the charts. They are big enough that I can&amp;rsquo;t just drop them into the UI.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Kalshi Markets dataset is ~2 GB &amp;amp; 30 million rows with data on all historical markets&lt;/li&gt;
&lt;li&gt;The Kalshi Trades dataset is ~11 GB &amp;amp; 203 million rows with data on every single trade to ever occur on Kalshi&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I loaded my local copies to MotherDuck by establishing a connection to a new database I set up.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/MotherDuck.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;p&gt;Then it took about 30 seconds to give Hex a Read/Write access token, and I could start writing SQL against my tables.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Hex_images/DataBrowse.png&#34; alt=&#34;Hex Intro&#34;&gt;&lt;/p&gt;
&lt;p&gt;I gave it a pitiful prompt/context and in under 5 minutes it produced some charts that you can view &lt;a href=&#34;https://app.hex.tech/019d6027-9e08-711c-9103-96dd87d47720/app/Kalshi-032ulYqucX78NzabnOThQS/latest&#34;&gt;here&lt;/a&gt;. Again, to do this well with the data in under 5 minutes with limited context is quite impressive. You can directly compare these to a few charts in the actual article&amp;hellip; and they get the point across, I&amp;rsquo;m sure that with a bit of effort I could get these one&amp;rsquo;s inside Hex to look quite nice.&lt;/p&gt;
&lt;details class=&#34;spoiler&#34;&gt;
  &lt;summary&gt;The Prompt&lt;/summary&gt;
  &lt;div class=&#34;spoiler-content&#34;&gt;
    &lt;p&gt;Please use the Kalshi data assets (markets &amp;amp; trades) to derive a series of visualizations.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Show me weekly trade volume in a bar chart&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Show me contracts per trade month over month avg and median&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Show me a stacked bar chart displaying total trade volume by month with the bars segmented into the implied probability of each trade in cohorts of 0-20%, 20-40%, 40-60%, 60-80%, and 80-100%&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

  &lt;/div&gt;
&lt;/details&gt;
&lt;h3 id=&#34;overall-thoughts&#34;&gt;Overall Thoughts&lt;/h3&gt;
&lt;p&gt;I&amp;rsquo;m impressed. It&amp;rsquo;s a well thought out product. It&amp;rsquo;s definitely much more than just a &amp;ldquo;Claude wrapper&amp;rdquo;, and they are clearly trying to integrate things that modern data teams need. For a quick proof of concept on a personal project I could easily see myself tossing some data in here with some prompting/context as long as I had access to the elevated features of a &amp;ldquo;Team Plan&amp;rdquo;. I do think this is geared more toward enterprise, as it&amp;rsquo;s way overkill for a hobby project, and I can pretty much get all the features I&amp;rsquo;d want out of this with Claude Code, Marimo, and a little know-how. But they &lt;a href=&#34;https://hex.tech/blog/i-tried-to-vibe-code-hex/&#34;&gt;know this&lt;/a&gt; and can cater to the &amp;ldquo;Yeah we don&amp;rsquo;t want to maintain that but want that functionality&amp;rdquo; crowd.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m always a little weary of notebook development for large teams. How do you stop people from building data assets in parallel that describe the &amp;ldquo;same&amp;rdquo; thing but end up with a different result. This is a perennial problem that Hex seems to be great at reducing in comparison to people hand waiving Jupyter notebooks and pivot tables. And holy FUCK can we please realize that there are better things out their than just isolated local notebooks performing data wrangling, visualization, and presentation&amp;hellip; they do NOT scale&amp;hellip; use Hex.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m trying to think of other headwinds here. I mean, this is a lot of product. I&amp;rsquo;m familiar with these concepts, but if the ideal state is to drop this in front of 20 PMs and say &amp;ldquo;Hey you can make your own dashboards and data assets now&amp;rdquo; there would be a significant amount of churn/up-skilling where as Data Science could probably get on board with this very quickly. You still need to solve for upstream, Hex isn&amp;rsquo;t going to fix your broken medallion architecture and redundant/stale tables. Notebooks are a classic foot gun, if people don&amp;rsquo;t use the tools that Hex provides (DBT integration, jinja templating, version control, data catalogue, etc&amp;hellip;) you could easily run into spaghetti mode just as anything else. Hex can lead your team to the data lake, but they can&amp;rsquo;t make you&amp;hellip;&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>SQL Buckets</title>
      <link>/posts/sql_buckets/</link>
      <pubDate>Sun, 05 Oct 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/sql_buckets/</guid>
      <description>&lt;h3 id=&#34;case-statements&#34;&gt;Case Statements:&lt;/h3&gt;
&lt;p&gt;A common occurrence when writing SQL is needing to bucket some data and look at how it is distributed across said buckets.&lt;/p&gt;
&lt;p&gt;Typically I&amp;rsquo;d write (have an LLM spit out buckets) something like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CASE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;a. &amp;lt; 5&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;b. &amp;gt;=5 - &amp;lt;10&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;15&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;c. &amp;gt;=10 - &amp;lt;15&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            ETC...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;ELSE&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;u. &amp;gt;100&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;END&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; COLUMN_TO_BUCKET_DIST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      , &lt;span style=&#34;color:#66d9ef&#34;&gt;SUM&lt;/span&gt;(THIS_IS_ANNOYING) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; ANNOYED
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I have to add a character/number in each bucket to allow for sorting, making writing/updating the lengthy case statement that much more annoying. I have to write all this because unlike &lt;a href=&#34;https://docs.snowflake.com/en/sql-reference/functions/width_bucket&#34;&gt;other OLAP DBMS&lt;/a&gt;, Redshift doesn&amp;rsquo;t have a bucketing function that I&amp;rsquo;m aware of.&lt;/p&gt;</description>
      <content>&lt;h3 id=&#34;case-statements&#34;&gt;Case Statements:&lt;/h3&gt;
&lt;p&gt;A common occurrence when writing SQL is needing to bucket some data and look at how it is distributed across said buckets.&lt;/p&gt;
&lt;p&gt;Typically I&amp;rsquo;d write (have an LLM spit out buckets) something like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CASE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;a. &amp;lt; 5&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;b. &amp;gt;=5 - &amp;lt;10&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;15&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;c. &amp;gt;=10 - &amp;lt;15&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            ETC...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;ELSE&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;u. &amp;gt;100&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;END&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; COLUMN_TO_BUCKET_DIST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      , &lt;span style=&#34;color:#66d9ef&#34;&gt;SUM&lt;/span&gt;(THIS_IS_ANNOYING) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; ANNOYED
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I have to add a character/number in each bucket to allow for sorting, making writing/updating the lengthy case statement that much more annoying. I have to write all this because unlike &lt;a href=&#34;https://docs.snowflake.com/en/sql-reference/functions/width_bucket&#34;&gt;other OLAP DBMS&lt;/a&gt;, Redshift doesn&amp;rsquo;t have a bucketing function that I&amp;rsquo;m aware of.&lt;/p&gt;
&lt;p&gt;Or do I&amp;hellip;&lt;/p&gt;
&lt;h3 id=&#34;solution&#34;&gt;Solution:&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s say I have a table that records when a record was created and when it was last updated. I&amp;rsquo;ve never used this table before, and I&amp;rsquo;m interested in the distribution of the number of days between created and updated to see how common it is for a record to have an update &lt;code&gt;n&lt;/code&gt; days after it was created.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; TEMP &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;#&lt;/span&gt;DELTA_DIST &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; A_UNIQUE_ID_OF_A_ROW
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , DATE_DIFF(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;DAY&amp;#39;&lt;/span&gt;, CREATION_DATE, LAST_UPDATED) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; CD_LD
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; MY_TABLE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;WHERE&lt;/span&gt; A_FILTER &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;2025-01-01&amp;#39;&lt;/span&gt;::DATE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;AND&lt;/span&gt; ANOTHER_FILTER &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Cool, so we want to know how these records look like in some buckets!&lt;/p&gt;
&lt;p&gt;We could do this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CASE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;a. &amp;lt; 5 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;b. 5-10 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;15&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;c. 10-15 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;20&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;d. 15-20 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;25&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;e. 20-25 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;30&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;f. 25-30 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;35&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;g. 30-35 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;40&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;h. 35-40 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;45&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;i. 40-45 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;50&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;j. 45-50 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;55&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;k. 50-55 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;60&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;l. 55-60 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;65&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;m. 60-65 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;70&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;n. 65-70 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;75&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;o. 70-75 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;80&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;p. 75-80 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;85&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;q. 80-85 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;90&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;r. 85-90 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;95&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;s. 90-95 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;100&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;t. 95-100 Day&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;ELSE&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;u. &amp;gt;100&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;END&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; CD_LD_DIST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;, &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; A_UNIQUE_ID_OF_A_ROW) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; NUM_ID
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; DELTA_DIST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That is pretty annoying&amp;hellip; it is verbose and, if you&amp;rsquo;ve done this before, you&amp;rsquo;ll know that catch all bucket at the end will have some spike in records in it that you&amp;rsquo;ll want to further investigate. So I came up with the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;-- Creates the sortable digits at the beginning
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       FLOOR(COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Generates the first day in the range
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       (FLOOR(COLUMN_TO_BUCKET &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;-- Generates the last day in the range  (+4 as I&amp;#39;m bucketing by 5 days at a time)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; Days&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; COLUMN_TO_BUCKET_DIST &lt;span style=&#34;color:#75715e&#34;&gt;-- Adds the Days string at the end for clarity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;     , &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; THING_TO_AGGREGATE) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; AGGREGATED_THING
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; TABLE_I_WANT_TO_DISTRIBUTE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can pass in the column to bucket, adjust the bucket size, and also handle the number of buckets. This does use &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/r_LPAD.html&#34;&gt;LPAD&lt;/a&gt; but &lt;code&gt;FLOOR()&lt;/code&gt; is standard. Regardless, lets break it down in detail if you need to rewrite for a different SQL dialect.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;FLOOR: rounds a number down to the next whole number&lt;/li&gt;
&lt;li&gt;LPAD: prepends characters to a string&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On an input of &amp;ldquo;1&amp;rdquo; this is what the flow would look like to go from 1 -&amp;gt; 00. 0-4 Days&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- FLOOR of 1/5 = 0 and cast to text
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT)::TEXT &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- LPAD adds a leading zero to single digits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;00&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Just cosmetic, appending a &amp;#34;.&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;00.&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Now we know that this will iteratively return 0,1,2,3,n... as our input increases
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- And if we multiple by 5 we get the beginning of our bucket
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;       &lt;span style=&#34;color:#75715e&#34;&gt;-- 0 * 5 = 0 (for 0-4)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;       &lt;span style=&#34;color:#75715e&#34;&gt;-- 1 * 5 = 5 (for 5-9)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;       &lt;span style=&#34;color:#75715e&#34;&gt;-- etc...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;00. 0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Now we add another cosmetic piece ( a dash to separate start/end of bucket)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;00. 0-&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Remember how we generated the start of the bucket 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- If we know the bucket is going to span 5 days (inclusive of start and end) we just add 4 to the start
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; (FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;00. 0-4&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;--- Add final cosmetic piece &amp;#39; Days&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; (FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; Days&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;00. 0-4 Days&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We&amp;rsquo;ve transformed from an input of CD_LD = 1 and assigned it a sortable, bucketable (probably not a word) value -&amp;gt; &amp;lsquo;00. 0-4 Days&amp;rsquo;&lt;/p&gt;
&lt;h3 id=&#34;im-worried-about-an-unconstrained-series-of-buckets-how-do-i-cap-it&#34;&gt;I&amp;rsquo;m worried about an unconstrained series of buckets&amp;hellip; how do I cap it?&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt;  &lt;span style=&#34;color:#75715e&#34;&gt;-- Define our buckets when we are below our threshold
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;       &lt;span style=&#34;color:#66d9ef&#34;&gt;CASE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;100&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt; LPAD(FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       (FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; Days&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       &lt;span style=&#34;color:#75715e&#34;&gt;-- Define final bucket when threshold exceeded
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;       &lt;span style=&#34;color:#66d9ef&#34;&gt;ELSE&lt;/span&gt; LPAD(FLOOR(&lt;span style=&#34;color:#ae81ff&#34;&gt;100&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;gt;100&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;END&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; CD_LD_DIST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; A_UNIQUE_ID_OF_A_ROW) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; NUM_ID
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; DELTA_DIST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;i-want-to-change-the-increment-size&#34;&gt;I want to change the increment size&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Change increment size from 5 days to 7 days
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- 5 -&amp;gt; 7
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- 4 -&amp;gt; 6
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt;::FLOAT)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       (FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;-- Generates the last day in the range  (+6 as I&amp;#39;m bucketing by 7 days at a time)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; Days&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; CD_LD_DIST &lt;span style=&#34;color:#75715e&#34;&gt;-- Adds the Days string at the end for clarity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;      , &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; A_UNIQUE_ID_OF_A_ROW) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; NUM_ID
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; DELTA_DIST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;i-have-more-than-99-buckets-the-two-digit-solution-wont-work&#34;&gt;I have more than 99 buckets, the two digit solution wont work:&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Pad an additional 0 with LPAD
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; LPAD(FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT)::TEXT, &lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#75715e&#34;&gt;-- 2 -&amp;gt; 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;. &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       (FLOOR(CD_LD &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;::FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;) 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; Days&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; CD_LD_DIST &lt;span style=&#34;color:#75715e&#34;&gt;-- Adds the Days string at the end for clarity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;      , &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; A_UNIQUE_ID_OF_A_ROW) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; NUM_ID
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; DELTA_DIST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Hope that&amp;rsquo;s helpful! Maybe there&amp;rsquo;s a better solution, I didn&amp;rsquo;t do an exhaustive search!&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>Your Data Warehouse Ignores Your Primary Key</title>
      <link>/posts/redshift_pk/</link>
      <pubDate>Thu, 19 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/redshift_pk/</guid>
      <description>&lt;h2 id=&#34;the-agenda&#34;&gt;The agenda&amp;hellip;&lt;/h2&gt;
&lt;p&gt;Today we&amp;rsquo;re digging into the curious case of &amp;ldquo;&lt;a href=&#34;https://en.wikipedia.org/wiki/Primary_key&#34;&gt;Primary Keys&lt;/a&gt;&amp;rdquo; in modern data warehouses (think Redshift, BigQuery, Snowflake). On the menu:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A high-stakes request from our VP&lt;/li&gt;
&lt;li&gt;What happens when a &lt;code&gt;PRIMARY KEY&lt;/code&gt; doesn&amp;rsquo;t do what you think it does&lt;/li&gt;
&lt;li&gt;A hands-on experiment in Postgres to see why OLAP systems choose not to enforce a PK&lt;/li&gt;
&lt;li&gt;A dash of data quality, some opinions, and a tangent or two&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;a-little-background&#34;&gt;A little background&lt;/h2&gt;
&lt;p&gt;Lets take Redshift for example. In Redshift, primary keys are &lt;em&gt;&lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html&#34;&gt;informational only&lt;/a&gt;&lt;/em&gt;; they are not enforced. However, the query optimizer will trust our declaration and use it to create more efficient execution plans. How &lt;em&gt;silly&lt;/em&gt; of Redshift to trust us like that.&lt;/p&gt;</description>
      <content>&lt;h2 id=&#34;the-agenda&#34;&gt;The agenda&amp;hellip;&lt;/h2&gt;
&lt;p&gt;Today we&amp;rsquo;re digging into the curious case of &amp;ldquo;&lt;a href=&#34;https://en.wikipedia.org/wiki/Primary_key&#34;&gt;Primary Keys&lt;/a&gt;&amp;rdquo; in modern data warehouses (think Redshift, BigQuery, Snowflake). On the menu:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A high-stakes request from our VP&lt;/li&gt;
&lt;li&gt;What happens when a &lt;code&gt;PRIMARY KEY&lt;/code&gt; doesn&amp;rsquo;t do what you think it does&lt;/li&gt;
&lt;li&gt;A hands-on experiment in Postgres to see why OLAP systems choose not to enforce a PK&lt;/li&gt;
&lt;li&gt;A dash of data quality, some opinions, and a tangent or two&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;a-little-background&#34;&gt;A little background&lt;/h2&gt;
&lt;p&gt;Lets take Redshift for example. In Redshift, primary keys are &lt;em&gt;&lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html&#34;&gt;informational only&lt;/a&gt;&lt;/em&gt;; they are not enforced. However, the query optimizer will trust our declaration and use it to create more efficient execution plans. How &lt;em&gt;silly&lt;/em&gt; of Redshift to trust us like that.&lt;/p&gt;
&lt;p&gt;This is not shocking to the seasoned data warehouse connoisseur, but to the uninitiated this may look unnerving. So for anyone unfamiliar here&amp;rsquo;s what I mean.&lt;/p&gt;
&lt;h2 id=&#34;our-vp-needs-to-know&#34;&gt;Our VP needs to know&lt;/h2&gt;
&lt;p&gt;Our VP needs to know if Bob is hungry or not to understand whether our start up is doing good or bad.&lt;/p&gt;
&lt;p&gt;Yikes, we deleted all our data when the U.S. &lt;a href=&#34;https://www.spacecom.mil/&#34;&gt;Space Command&lt;/a&gt; was investigating us for fraud! We can&amp;rsquo;t tell our VP that we don&amp;rsquo;t know anything about Bob so lets insert some random records into our datawarehouse. We&amp;rsquo;ll need to start from scratch!&lt;/p&gt;
&lt;p&gt;First lets make a new table and define a primary key.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- REMEMBER! When Storing strings make sure to take up as much space as possible 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Defining these columns as CHAR pads strings with blanks to ensure we waste as much space as we can 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Everyone will love you for this, and you&amp;#39;ll probably get promoted!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; CHECK123 (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;SOME_ID CHAR(&lt;span style=&#34;color:#ae81ff&#34;&gt;4096&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;PRIMARY&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;KEY&lt;/span&gt;, &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Ahh, a nice PK
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;AN_ATTRIBUTE CHAR(&lt;span style=&#34;color:#ae81ff&#34;&gt;4096&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ANOTHER_ATTRIBUTE CHAR(&lt;span style=&#34;color:#ae81ff&#34;&gt;4096&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;SOME_NUMBER INT4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alright lets manually insert our correct data!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; CHECK123
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;VALUES&lt;/span&gt; (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;abc123&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Bob&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Hungry&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Hmmm...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;     , (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;wow&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Big Tom&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Fast&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;86&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;this&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Dale&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Thoughtful&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;is&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Doug&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Tired&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;9&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;pretty&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Hammer time&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Happy&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;annoying&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;The bug unit&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Angry&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;abc123&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Whopper&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Sad&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Hmmm...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And run our important query that our VP needs. We don&amp;rsquo;t have time for data quality checks so lets throw it into Excel and send it everywhere.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; SOME_ID
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CHECK123
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; SOME_ID;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;abc123,&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- UH OH!!!!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;wow,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;this,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;is&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pretty,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;annoying,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; SOME_ID &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CHECK123;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;abc123 &lt;span style=&#34;color:#75715e&#34;&gt;-- UH OH!!!!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;wow
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;this
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;is&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pretty
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;annoying
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;abc123  &lt;span style=&#34;color:#75715e&#34;&gt;-- UH OH!!!!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Oh no&amp;hellip; we have duplicate records even after writing &lt;code&gt;SELECT DISTINCT&lt;/code&gt;. &lt;em&gt;But I thought &lt;code&gt;SELECT DISTINCT&lt;/code&gt; would remove duplicates?&lt;/em&gt; Well, yes, but&amp;hellip; we declared SOME_ID as a PRIMARY KEY, effectively telling Redshift, &amp;ldquo;I solemnly swear this column contains no duplicates.&amp;rdquo; The query optimizer, trusting us completely, chooses the most efficient execution plan: it just scans the column and hands it back without performing a costly deduplication step. Why would it? We already promised it was unique.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Imagine someone gives you a book with &lt;a href=&#34;https://www.youtube.com/watch?v=EJR1H5tf5wE&#34;&gt;1,000,000&lt;/a&gt; unsorted names in it and tells you they are unique. Then they say, &lt;em&gt;&amp;ldquo;cross out any duplicate names and hand the book back to me&amp;rdquo;&lt;/em&gt;. You&amp;rsquo;re initial reaction is to immediately throw that book back as checking for uniqueness of one million names sounds&amp;hellip; well&amp;hellip; awful.&lt;/p&gt;&lt;/blockquote&gt;
&lt;div class=&#34;book-throw-animation-container&#34; style=&#34;max-width: 500px; margin: 1.5em auto; text-align: center;&#34;&gt;
    &lt;svg xmlns=&#34;http://www.w3.org/2000/svg&#34; viewBox=&#34;0 0 400 150&#34;&gt;
        &lt;defs&gt;
            &lt;g id=&#34;figure-def&#34;&gt;
                &lt;circle r=&#34;8&#34; cx=&#34;0&#34; cy=&#34;84&#34; fill=&#34;currentColor&#34;/&gt;
                &lt;line x1=&#34;0&#34; y1=&#34;92&#34; x2=&#34;0&#34; y2=&#34;105&#34; stroke=&#34;currentColor&#34; stroke-width=&#34;2&#34;/&gt;
                &lt;line x1=&#34;0&#34; y1=&#34;105&#34; x2=&#34;-8&#34; y2=&#34;120&#34; stroke=&#34;currentColor&#34; stroke-width=&#34;2&#34;/&gt;
                &lt;line x1=&#34;0&#34; y1=&#34;105&#34; x2=&#34;8&#34; y2=&#34;120&#34; stroke=&#34;currentColor&#34; stroke-width=&#34;2&#34;/&gt;
                &lt;line x1=&#34;0&#34; y1=&#34;98&#34; x2=&#34;-10&#34; y2=&#34;105&#34; stroke=&#34;currentColor&#34; stroke-width=&#34;2&#34;/&gt;
            &lt;/g&gt;
            &lt;rect id=&#34;book-def&#34; width=&#34;15&#34; height=&#34;20&#34; rx=&#34;2&#34; fill=&#34;#c57436&#34;/&gt;
        &lt;/defs&gt;

        &lt;line x1=&#34;10&#34; y1=&#34;120&#34; x2=&#34;390&#34; y2=&#34;120&#34; stroke=&#34;currentColor&#34; stroke-width=&#34;1&#34; stroke-linecap=&#34;round&#34;/&gt;

        &lt;g id=&#34;thrower&#34;&gt;
            &lt;use href=&#34;#figure-def&#34; x=&#34;50&#34; y=&#34;0&#34;/&gt;
            &lt;line x1=&#34;50&#34; y1=&#34;98&#34; x2=&#34;60&#34; y2=&#34;105&#34; stroke=&#34;currentColor&#34; stroke-width=&#34;2&#34; id=&#34;throwing-arm&#34;&gt;
                &lt;animateTransform id=&#34;anim-arm-swing&#34; attributeName=&#34;transform&#34; type=&#34;rotate&#34; from=&#34;0 50 98&#34; to=&#34;-45 50 98&#34; dur=&#34;0.5s&#34; begin=&#34;indefinite&#34; fill=&#34;freeze&#34; keyTimes=&#34;0; 0.6; 1&#34; values=&#34;0 50 98; -60 50 98; -45 50 98&#34;/&gt;
            &lt;/line&gt;
        &lt;/g&gt;

        
        &lt;g id=&#34;receiver-container&#34;&gt;
            
            &lt;use href=&#34;#figure-def&#34; x=&#34;260&#34; y=&#34;0&#34; id=&#34;receiver&#34;&gt;
                
                &lt;line x1=&#34;250&#34; y1=&#34;105&#34; x2=&#34;260&#34; y2=&#34;98&#34; stroke=&#34;currentColor&#34; stroke-width=&#34;2&#34;/&gt;
                
                &lt;animateTransform id=&#34;anim-fall-over&#34; attributeName=&#34;transform&#34; type=&#34;rotate&#34; from=&#34;0 268 120&#34; to=&#34;-90 268 120&#34; dur=&#34;0.5s&#34; begin=&#34;indefinite&#34; fill=&#34;freeze&#34; calcMode=&#34;spline&#34; keyTimes=&#34;0; 1&#34; keySplines=&#34;0.5 0 0.5 1&#34;/&gt;
            &lt;/use&gt;
        &lt;/g&gt;

        &lt;g id=&#34;book-wrapper&#34;&gt;
            &lt;use href=&#34;#book-def&#34; x=&#34;0&#34; y=&#34;0&#34;/&gt;
            
            &lt;text x=&#34;7.5&#34; y=&#34;-5&#34; text-anchor=&#34;middle&#34; font-family=&#34;sans-serif&#34; font-size=&#34;10px&#34; fill=&#34;currentColor&#34;&gt;book of names&lt;/text&gt;

            
            &lt;animateMotion id=&#34;anim-book-flight&#34; path=&#34;M 65 95 Q 155 50 250 95&#34; dur=&#34;1s&#34; begin=&#34;indefinite&#34; fill=&#34;freeze&#34; rotate=&#34;auto&#34;/&gt;
            
            &lt;animateMotion id=&#34;anim-book-drop&#34; path=&#34;M 250 95 L 250 100&#34; dur=&#34;0.3s&#34; begin=&#34;indefinite&#34; fill=&#34;freeze&#34;/&gt;
            &lt;animate id=&#34;anim-book-visibility&#34; attributeName=&#34;visibility&#34; from=&#34;hidden&#34; to=&#34;visible&#34; dur=&#34;0.1s&#34; begin=&#34;indefinite&#34; fill=&#34;freeze&#34;/&gt;
        &lt;/g&gt;
    &lt;/svg&gt;
&lt;/div&gt;

&lt;script&gt;
(function() {
    function setupAnimation(container) {
        const receiverNode = container.querySelector(&#39;#receiver&#39;);
        const pristineReceiver = receiverNode.cloneNode(true);
        const receiverContainer = container.querySelector(&#39;#receiver-container&#39;);

        const armNode = container.querySelector(&#39;#throwing-arm&#39;);
        const pristineArm = armNode.cloneNode(true);
        const throwerGroup = container.querySelector(&#39;#thrower&#39;);

        function resetAnimation() {
            const currentReceiver = container.querySelector(&#39;#receiver&#39;);
            const currentArm = container.querySelector(&#39;#throwing-arm&#39;);
            
            if (currentReceiver) receiverContainer.replaceChild(pristineReceiver.cloneNode(true), currentReceiver);
            if (currentArm) throwerGroup.replaceChild(pristineArm.cloneNode(true), currentArm);

            const bookWrapper = container.querySelector(&#39;#book-wrapper&#39;);
            bookWrapper.style.visibility = &#39;hidden&#39;;
            bookWrapper.querySelector(&#39;#anim-book-flight&#39;).endElement();
            bookWrapper.querySelector(&#39;#anim-book-drop&#39;).endElement();
        }

        function runAnimationSequence() {
            resetAnimation();
            
            const armSwing = container.querySelector(&#39;#anim-arm-swing&#39;);
            const fallOver = container.querySelector(&#39;#anim-fall-over&#39;);
            const bookFlight = container.querySelector(&#39;#anim-book-flight&#39;);
            const bookDrop = container.querySelector(&#39;#anim-book-drop&#39;);
            const bookWrapper = container.querySelector(&#39;#book-wrapper&#39;);

            setTimeout(() =&gt; { bookWrapper.style.visibility = &#39;visible&#39;; }, 100);
            setTimeout(() =&gt; { armSwing.beginElement(); }, 200);
            setTimeout(() =&gt; { bookFlight.beginElement(); }, 400);
            setTimeout(() =&gt; {
                fallOver.beginElement();
                bookDrop.beginElement();
            }, 1400);
        }
        
        const observer = new IntersectionObserver((entries) =&gt; {
            entries.forEach(entry =&gt; {
                if (entry.isIntersecting) {
                    runAnimationSequence();
                    entry.target.animationInterval = setInterval(() =&gt; {
                         if (entry.target.isConnected) { runAnimationSequence(); }
                    }, 4000);
                } else {
                    clearInterval(entry.target.animationInterval);
                    resetAnimation();
                }
            });
        }, { threshold: 0.5 });

        observer.observe(container);
    }

    document.querySelectorAll(&#39;.book-throw-animation-container&#39;).forEach(setupAnimation);
})();
&lt;/script&gt;
&lt;p&gt;In this case, the query optimizer is just handing us the book back. If we were to run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;EXPLAIN&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; SOME_ID &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CHECK123
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We&amp;rsquo;d get something back like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;XN SEQ SCAN &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt; check123 (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;07&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;304&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That&amp;rsquo;s just a SEQ SCAN handing us back the entire column because we&amp;rsquo;ve already told Redshift these values are unique. In contrast, what if we scan a non-PK column for uniqueness:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; AN_ATTRIBUTE &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CHECK123;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And and there you have it, now we get:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;XN &lt;span style=&#34;color:#66d9ef&#34;&gt;Unique&lt;/span&gt;  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;09&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;304&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; XN SEQ SCAN &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt; check123 (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;07&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;304&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You&amp;rsquo;ll find &lt;code&gt;XN Unique&lt;/code&gt; in query plans when &lt;code&gt;DISTINCT&lt;/code&gt; or &lt;code&gt;UNION&lt;/code&gt; (which implies distinctness, unlike &lt;code&gt;UNION ALL&lt;/code&gt;) are used. Our DBMS now returns a unique result set from values &lt;strong&gt;it does not know&lt;/strong&gt; are unique. I go on a tangent about this below if you&amp;rsquo;re interested.&lt;/p&gt;
&lt;details class=&#34;spoiler&#34;&gt;
  &lt;summary&gt;Click to view a tangent on XN Unique &amp;amp; XN HashAggregate&lt;/summary&gt;
  &lt;div class=&#34;spoiler-content&#34;&gt;
    &lt;p&gt;If we have an index on the column deduplication can be fast, I&amp;rsquo;ll link the Postgres docs on &lt;a href=&#34;https://www.postgresql.org/docs/current/btree.html#BTREE-DEDUPLICATION&#34;&gt;B-TREE Indexes&lt;/a&gt; specifically the section on deduplication. But what if we don&amp;rsquo;t have an index&amp;hellip; Redshift doesn&amp;rsquo;t even have indexes&amp;hellip; Redshift just has metadata (min-max values) for &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html&#34;&gt;blocks&lt;/a&gt; written to disk called Zone Maps.&lt;/p&gt;
&lt;p&gt;Redshift &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/c-the-query-plan.html&#34;&gt;docs&lt;/a&gt; notes that &lt;code&gt;XN Unique&lt;/code&gt;: &amp;ldquo;Removes duplicates for SELECT DISTINCT queries and UNION queries.&amp;rdquo; This gets weird because I&amp;rsquo;d assume with this operator that we are sorting and plucking unique values from this column. However, there is no &lt;code&gt;XN Sort&lt;/code&gt; in that plan. &lt;code&gt;XN Sort&lt;/code&gt;:&amp;ldquo;Evaluates the ORDER BY clause and other sort operations, such as sorts required by &lt;code&gt;SELECT DISTINCT&lt;/code&gt; queries. Our &lt;code&gt;SELECT DISTINCT&lt;/code&gt; query above doesn&amp;rsquo;t have an &lt;code&gt;XN Sort&lt;/code&gt; it only contains an &lt;code&gt;XN Unique&lt;/code&gt;, maybe &lt;code&gt;XN Unique&lt;/code&gt; implies an &lt;code&gt;XN Sort&lt;/code&gt;? I generated an explain plan for a relation whose rows were 100% unsorted according to &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/r_STV_TBL_PERM.html&#34;&gt;STV_TBL_PER&lt;/a&gt; and that plan did NOT include an &lt;code&gt;XN Sort&lt;/code&gt; either&amp;hellip;&lt;/p&gt;
&lt;p&gt;I think a couple things could be happening:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The optimizer can choose between different strategies to perform our &lt;code&gt;SELECT DISTINCT&lt;/code&gt; and in these cases it&amp;rsquo;s actually choosing a hashing strategy but just labeling it as &lt;code&gt;XN Unique&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;XN Unique&lt;/code&gt; implies a sorting operation, and we&amp;rsquo;re just not being shown that in the plan&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For example if we take:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; AN_ATTRIBUTE &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CHECK123;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And re-write it as:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; AN_ATTRIBUTE &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CHECK123 &lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; AN_ATTRIBUTE;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We get:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;XN HashAggregate  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;09&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;09&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;304&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt; XN SEQ SCAN &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt; check123 (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;07&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;304&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is similar to XN Unique but the costs are different &lt;code&gt;cost=0.00..0.09&lt;/code&gt; &amp;amp; &lt;code&gt;cost=0.09..0.09&lt;/code&gt; respectively. The first number is the relative cost of returning the first row for this operation. The second value, in this case 0.09, provides the relative cost of completing the operation. So I&amp;rsquo;d surmise that depending on how we write this query:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;XN HashAggregate&lt;/code&gt; is a blocking operator designed for GROUP BY clauses; it must consume its entire input to correctly compute aggregate functions like COUNT() before returning any rows.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;XN Unique&lt;/code&gt; is a specialized operator for SELECT DISTINCT that can implement a streaming (non-blocking) hash algorithm, allowing it to output unique rows as soon as they are encountered.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This seems to explain why Unique can have a zero startup cost while HashAggregate&amp;rsquo;s startup cost reflects processing its entire input.&lt;/p&gt;
&lt;p&gt;On another note, I&amp;rsquo;m only able to get an &lt;code&gt;XN Sort&lt;/code&gt; operator via explicitly using &lt;code&gt;ORDER BY&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;DISTINCT&lt;/span&gt; AN_ATTRIBUTE &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CHECK123 &lt;span style=&#34;color:#66d9ef&#34;&gt;ORDER&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; AN_ATTRIBUTE;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;XN Merge  (cost=1000000000000.19..1000000000000.20 rows=7 width=304)               
  Merge Key: an_attribute                                                          
  -&amp;gt;  XN Network  (cost=1000000000000.19..1000000000000.20 rows=7 width=304)       
        Send to leader                                                             
        -&amp;gt;  XN Sort  (cost=1000000000000.19..1000000000000.20 rows=7 width=304)    
              Sort Key: an_attribute                                               
              -&amp;gt;  XN Unique  (cost=0.00..0.09 rows=7 width=304)                    
                    -&amp;gt;  XN Seq Scan on check123  (cost=0.00..0.07 rows=7 width=304)
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;Anyway, our VP is pissed because we sent trash, but why would an OLAP DBMS do something like this?&lt;/p&gt;
&lt;h2 id=&#34;pk-in-name-only&#34;&gt;PK in name only&lt;/h2&gt;
&lt;p&gt;We can see that our primary key&amp;rsquo;s exist in name only. The DBMS is not going to enforce them, its going to assume they are right, and if you want to enforce them then you&amp;rsquo;ll need to write that logic yourself.&lt;/p&gt;
&lt;p&gt;So what could be the reasons why thoughtful and well payed, Engineers, TPMs, Directors, and VPs building these cloud data warehouses decided to allow people to designate primary keys that aren&amp;rsquo;t enforced? I&amp;rsquo;ll boil the arguments down to the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;/posts/redshift_pk/#write-performance&#34;&gt;Write performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;/posts/redshift_pk/#these-constraints-should-be-enforced-upstream&#34;&gt;These constraints &lt;em&gt;should&lt;/em&gt; be enforced upstream&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;write-performance&#34;&gt;Write performance&lt;/h3&gt;
&lt;p&gt;What do we mean by write performance, and can we actually take a look at a DBMS and see this impact on performance?&lt;/p&gt;
&lt;p&gt;Yes, we can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;spin up Postgres inside of a container&lt;/li&gt;
&lt;li&gt;create tables with/without PK&lt;/li&gt;
&lt;li&gt;write data to said tables&lt;/li&gt;
&lt;li&gt;measure performance&lt;/li&gt;
&lt;li&gt;and even look a little deeper at what Postgres is doing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A couple caveats before, so I don&amp;rsquo;t get dunked on&amp;hellip;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Single Node vs. MPP&lt;/strong&gt;: I&amp;rsquo;m running Postgres inside a container on my &lt;del&gt;[kindle](Insert sponsored link)&lt;/del&gt; Mac, which is completely different than the orchestration/networking effort that is required for a cloud data warehouse with dozens of compute nodes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Row-Oriented vs. Columnar&lt;/strong&gt;: Postgres is row oriented, an entire row (SOME_ID, AN_ATTRIBUTE, ANOTHER_ATTRIBUTE, SOME_NUMBER) is stored together. Redshift utilizes columnar storage, each column is stored in a separate block(s). This is fantastic for aggregates &lt;code&gt;SELECT SUM(SOME_NUMBER)&lt;/code&gt; but makes single-row operations inefficient. If we wanted to update or insert a new row in Redshift while enforcing the primary key we&amp;rsquo;d have to:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Search for the existence of/find the id of the record we are updating in its block (If it doesn&amp;rsquo;t exist then insert)&lt;/li&gt;
&lt;li&gt;Then find the corresponding attributes in their blocks, so on and so forth for every column&lt;/li&gt;
&lt;li&gt;Just to reconstruct one tuple for an update&amp;hellip; the opposite of what columnar stores are designed for&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This leads into the &lt;a href=&#34;/posts/redshift_pk/#these-constraints-should-be-enforced-upstream&#34;&gt;second point&lt;/a&gt; people will make&amp;hellip; that this isn&amp;rsquo;t what the DW is for. You can now see that these inserts are going to be gnarly, why not have some sort of &lt;a href=&#34;https://en.wikipedia.org/wiki/Slowly_changing_dimension&#34;&gt;slowly changing dimension&lt;/a&gt; that would allow for faster writes and control changes to facts in an OLTP DBMS better suited to handle transactional workloads.&lt;/p&gt;
&lt;h4 id=&#34;insert-with--wo-a-pk&#34;&gt;Insert with &amp;amp; w/o a PK&lt;/h4&gt;
&lt;p&gt;Anyway&amp;hellip; lets do our little test:&lt;/p&gt;
&lt;p&gt;First we pull the docker image and run a container in detached mode:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker run --name postgres-test -e POSTGRES_PASSWORD&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;mysecretpassword -p 5432:5432 -d postgres
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can check the status of our new container with &lt;code&gt;docker ps -a&lt;/code&gt; then we can run &lt;code&gt;docker exec -it postgres-test psql -U postgres&lt;/code&gt; to get a psql shell. We&amp;rsquo;ll create two tables, one with a PK and one without, and use &lt;code&gt;generate_series()&lt;/code&gt; to curate some faux data for us to write to these tables, something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; i, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;some payload data for row &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; i PAYLOAD &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; generate_series(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;) s(i);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Which will produce the following ID + PAYLOAD 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; i  &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt;           PAYLOAD          
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;8&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;9&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; payload &lt;span style=&#34;color:#66d9ef&#34;&gt;data&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So lets create our two tables, one with a PK one without:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; no_pk_test (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    id  INTEGER,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    payload TEXT
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; with_pk_test (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    id  INTEGER &lt;span style=&#34;color:#66d9ef&#34;&gt;PRIMARY&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    payload TEXT
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;details class=&#34;spoiler&#34;&gt;
  &lt;summary&gt;Click to view a deep dive into Postgres internals&lt;/summary&gt;
  &lt;div class=&#34;spoiler-content&#34;&gt;
    &lt;p&gt;Now we have two tables and &lt;code&gt;with_pk_test&lt;/code&gt; has a primary key. We can see it has an associated index:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; oid, relname, relkind &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; pg_class &lt;span style=&#34;color:#66d9ef&#34;&gt;WHERE&lt;/span&gt; relname &lt;span style=&#34;color:#66d9ef&#34;&gt;IN&lt;/span&gt; (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;with_pk_test&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;with_pk_test_pkey&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Returns:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;  oid  &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt;      relname      &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; relkind 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-------+-------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16393&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; with_pk_test      &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; r &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Our Relation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16398&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; with_pk_test_pkey &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; i &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Our index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Each table and index is stored in a separate file. We can also see the initial size of this index, Postgres has allocated the first 8kB &lt;a href=&#34;https://www.postgresql.org/docs/current/storage-page-layout.html&#34;&gt;page&lt;/a&gt; for our &lt;a href=&#34;https://www.postgresql.org/docs/current/indexes-types.html&#34;&gt;B tree&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- 8192 bytes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; pg_size_pretty(pg_relation_size(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;with_pk_test_pkey&amp;#39;&lt;/span&gt;)) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; index_size;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can get the OID of all databases with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; pg_database;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Returns (truncated Columns):
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; oid &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt;  datname  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-----+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; template1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; template0
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- We can see our current database with: 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; oid &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; pg_database &lt;span style=&#34;color:#66d9ef&#34;&gt;WHERE&lt;/span&gt; datname &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; current_database();
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Returns:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can exit psql (&lt;code&gt;\q&lt;/code&gt;) and use bash instead: &lt;code&gt;docker exec -it postgres-test bash&lt;/code&gt;. We can cd to the base directory and list the directories.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cd /var/lib/postgresql/data/base &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; ls&lt;span style=&#34;color:#e6db74&#34;&gt;`&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;# shows 1  4  5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Nice&amp;hellip; 1, 4, and 5 match the OID for the databases we listed above, and now if we &lt;code&gt;cd to 5&lt;/code&gt; and &lt;code&gt;ls -lh&lt;/code&gt; we&amp;rsquo;ll find:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;-rw------- 1 postgres postgres    0 Jun 23 16:30 16388 &amp;lt;-- OID of our relation no_pk_test
-rw------- 1 postgres postgres    0 Jun 23 16:30 16391
-rw------- 1 postgres postgres 8.0K Jun 23 16:30 16392
-rw------- 1 postgres postgres    0 Jun 23 16:31 16393 &amp;lt;-- OID of our relation with_pk_test
-rw------- 1 postgres postgres    0 Jun 23 16:31 16396
-rw------- 1 postgres postgres 8.0K Jun 23 16:31 16397
-rw------- 1 postgres postgres 8.0K Jun 23 16:31 16398 &amp;lt;-- OID of our index with_pk_test_pkey
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;Alright, with that deep dive out of the way lets insert 10MM records using &lt;code&gt;generate_series()&lt;/code&gt; into these empty relations:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- INSERT 0 10000000
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Time: 3846.560 ms (00:03.847)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;\&lt;/span&gt;timing &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; no_pk_test &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; i, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;some payload data for row &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; i PAYLOAD &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; generate_series(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;10000000&lt;/span&gt;) s(i);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- INSERT 0 10000000
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Time: 6038.888 ms (00:06.039)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; with_pk_test &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; i, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;some payload data for row &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; i PAYLOAD &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; generate_series(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;10000000&lt;/span&gt;) s(i);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Yep, closing in on twice as long (3846.560 ms vs 6038.888 ms). And just for fun we can check on the size of our tables/index.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;table_name&lt;/th&gt;
          &lt;th&gt;table_size&lt;/th&gt;
          &lt;th&gt;indexes_size&lt;/th&gt;
          &lt;th&gt;total_size&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&amp;ldquo;public&amp;rdquo;.&amp;ldquo;with_pk_test&amp;rdquo;&lt;/td&gt;
          &lt;td&gt;651 MB&lt;/td&gt;
          &lt;td&gt;214 MB&lt;/td&gt;
          &lt;td&gt;865 MB&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&amp;ldquo;public&amp;rdquo;.&amp;ldquo;no_pk_test&amp;rdquo;&lt;/td&gt;
          &lt;td&gt;651 MB&lt;/td&gt;
          &lt;td&gt;0 bytes&lt;/td&gt;
          &lt;td&gt;651 MB&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;And for even more fun we can see this time to insert explode further after adding a secondary Index.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; multi_index_test (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  id INTEGER &lt;span style=&#34;color:#66d9ef&#34;&gt;PRIMARY&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  name VARCHAR(&lt;span style=&#34;color:#ae81ff&#34;&gt;256&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;NOT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;NULL&lt;/span&gt;, &lt;span style=&#34;color:#75715e&#34;&gt;-- Our new table has a new column (name)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;  payload TEXT
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INDEX&lt;/span&gt; idx_name &lt;span style=&#34;color:#66d9ef&#34;&gt;ON&lt;/span&gt; multi_index_test (name); &lt;span style=&#34;color:#75715e&#34;&gt;-- Add a secondary B+Tree index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; multi_index_test
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  i,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;user_name_&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; (i &lt;span style=&#34;color:#f92672&#34;&gt;%&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;100000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;some payload data for row &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; i
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; generate_series(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;10000000&lt;/span&gt;) s(i);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Time: &lt;span style=&#34;color:#ae81ff&#34;&gt;30859&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;168&lt;/span&gt; ms (&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;30&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;859&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- *cough*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There is NO free lunch, sure you can make your reads faster, but at what cost?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Are people even using the index?&lt;/li&gt;
&lt;li&gt;Whats the ratio of writes to reads on this table anyway?&lt;/li&gt;
&lt;li&gt;I wonder how many B+ Trees I can fit in my 2012 Chevy impala LT?&lt;/li&gt;
&lt;li&gt;These are the questions we need to answer&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;merge-upsert&#34;&gt;Merge Upsert&lt;/h4&gt;
&lt;p&gt;What happens if we insert additional records into our tables that already have 10,000,000 records in them. We could compare the data warehouse style insert where we insert everything vs a merge upsert where we update records that already exist and insert new records that don&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s regenerate IDs spanning 9,000,001 -&amp;gt; 11,000,000 so we get a mix of new and already existing records and perform our two writes.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Creating a new table with the records as discussed above 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; TEMP &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; new_batch &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; i, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;new payload for &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; i &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; generate_series(&lt;span style=&#34;color:#ae81ff&#34;&gt;9000001&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;11000000&lt;/span&gt;) s(i);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;We can perform our basic insert into &lt;code&gt;no_pk_test&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; no_pk_test
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; new_batch;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Time: &lt;span style=&#34;color:#ae81ff&#34;&gt;598&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;475&lt;/span&gt; ms 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- 12MM Records now, 10MM old and 2MM new
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;LEFT&lt;/span&gt;(payload, &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;), &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; no_pk_test &lt;span style=&#34;color:#66d9ef&#34;&gt;group&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;by&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;left&lt;/span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;count&lt;/span&gt;   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;new&lt;/span&gt; payloa &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;2000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; paylo &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now what about our upsert? We can use &lt;code&gt;On Conflict&lt;/code&gt; &lt;a href=&#34;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65&#34;&gt;courtesy&lt;/a&gt; of Peter Geoghegan, Heikki Linnakangas, Andres Freund and Jeff Janes.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; with_pk_test
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; new_batch
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;ON&lt;/span&gt; CONFLICT (id) &lt;span style=&#34;color:#66d9ef&#34;&gt;DO&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;UPDATE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    payload &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; payload;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Time: &lt;span style=&#34;color:#ae81ff&#34;&gt;3287&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;132&lt;/span&gt; ms (&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;03&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;287&lt;/span&gt;) 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                QUERY PLAN                                
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;--------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;Insert&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt; with_pk_test  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;33414&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;40&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   Conflict Resolution: &lt;span style=&#34;color:#66d9ef&#34;&gt;UPDATE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   Conflict Arbiter Indexes: with_pk_test_pkey
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;  Seq Scan &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt; new_batch  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;33414&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;40&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1869440&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;36&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- 11MM Records now, 9MM old and 2MM new
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;left&lt;/span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;count&lt;/span&gt;  
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;new&lt;/span&gt; payloa &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;some&lt;/span&gt; paylo &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;9000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- We can also use merge, slightly slower 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Time: 4593.521 ms (00:04.594)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;MERGE &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; with_pk_test pk
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;USING&lt;/span&gt; new_batch n
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;ON&lt;/span&gt; n.id &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pk.id
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; MATCHED &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;UPDATE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;SET&lt;/span&gt; payload &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; n.payload
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;WHEN&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;NOT&lt;/span&gt; MATCHED &lt;span style=&#34;color:#66d9ef&#34;&gt;THEN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; (id, payload)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;VALUES&lt;/span&gt; (id, payload);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;MERGE &lt;span style=&#34;color:#ae81ff&#34;&gt;2000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Time: &lt;span style=&#34;color:#ae81ff&#34;&gt;4593&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;521&lt;/span&gt; ms (&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;:&lt;span style=&#34;color:#ae81ff&#34;&gt;04&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;594&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                          QUERY PLAN                                          
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;----------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt; Merge &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt; with_pk_test pk  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;398687&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;23&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;523481&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;91&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;  Hash &lt;span style=&#34;color:#66d9ef&#34;&gt;Left&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;Join&lt;/span&gt;  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;398687&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;23&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;523481&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;91&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1869440&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;48&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;         Hash Cond: (n.id &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pk.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;         &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;  Seq Scan &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt; new_batch n  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;33414&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;40&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;1869440&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;42&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;         &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;  Hash  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;207833&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;88&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;207833&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;88&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;10979388&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               &lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;  Seq Scan &lt;span style=&#34;color:#66d9ef&#34;&gt;on&lt;/span&gt; with_pk_test pk  (cost&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;00&lt;/span&gt;..&lt;span style=&#34;color:#ae81ff&#34;&gt;207833&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;88&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;rows&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;10979388&lt;/span&gt; width&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The merge upsert takes much longer (598.475 ms -&amp;gt; 3287.132 ms). These are two different kinds of writes, but its generally indicative of what you might be trying to accomplish in an OLAP system vs OLTP. Hopefully this sheds some light on what people mean when they bring up the &amp;ldquo;performance&amp;rdquo; argument.&lt;/p&gt;
&lt;h3 id=&#34;these-constraints-should-be-enforced-upstream&#34;&gt;These constraints &lt;em&gt;should&lt;/em&gt; be enforced upstream&lt;/h3&gt;
&lt;p&gt;This is idea #2.&lt;/p&gt;
&lt;p&gt;And this is opening a can of opinions, hot takes and worms. I think in &lt;em&gt;theory&lt;/em&gt; maybe this works, but in the real world there are certainly people with relations sitting in their datawarehouse that could benefit from the safety of PK enforcement or, at least, have that option. So on one hand we have quick writes/optimized plans, and on the other hand all those performance gains are a wash because we have to write those deduping checks anyway and people are confused that their &lt;code&gt;PRIMARY KEY&lt;/code&gt; has duplicates in it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Opinion Time!&lt;/strong&gt; I don&amp;rsquo;t understand why the syntax in Redshift couldn&amp;rsquo;t have been changed to fully remove confusion. If you read the documentation its clear, but why call it a &lt;code&gt;PRIMARY KEY&lt;/code&gt; when it could have been &lt;code&gt;PRIMARY KEY NOT ENFORCED&lt;/code&gt;&amp;hellip; a little clarity over brevity. &lt;a href=&#34;https://www.databricks.com/blog/primary-key-and-foreign-key-constraints-are-ga-and-now-enable-faster-queries&#34;&gt;Databricks&lt;/a&gt; has the &lt;code&gt;RELY&lt;/code&gt; option. &lt;a href=&#34;https://docs.snowflake.com/en/sql-reference/constraints-overview&#34;&gt;Snowflake&lt;/a&gt; doesn&amp;rsquo;t enforce them in their standard table offering but do enforce them for hybrid tables. &lt;a href=&#34;https://cloud.google.com/blog/products/data-analytics/join-optimizations-with-bigquery-primary-and-foreign-keys&#34;&gt;BigQuery&lt;/a&gt; uses &lt;code&gt;NOT ENFORCED&lt;/code&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;(1048064189, Have fun out there!),
(1048064189, Hope you enjoyed the blog!),
(1048064189, Take Care!)&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>Query Banks</title>
      <link>/posts/query_bank/</link>
      <pubDate>Mon, 31 Mar 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/query_bank/</guid>
      <description>&lt;h2 id=&#34;that-query-bank-should-be-a-semantic-layer&#34;&gt;That query bank should be a semantic layer&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve done it myself:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;Oh we should just cache all these commonly used queries somewhere central, so we always have a nice definition for someone to use.&amp;rdquo;&lt;/p&gt;&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;You could even extend the functionality of this &amp;ldquo;Query Bank&amp;rdquo; and train an LLM, or perform code reviews, or implement search, etc&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Sweet, now anyone can grab some code and use it! They can paste it here and in this scheduled job there, science can use it and slam it into a notebook, and others can run ad hoc workloads against it.&lt;/p&gt;</description>
      <content>&lt;h2 id=&#34;that-query-bank-should-be-a-semantic-layer&#34;&gt;That query bank should be a semantic layer&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve done it myself:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;Oh we should just cache all these commonly used queries somewhere central, so we always have a nice definition for someone to use.&amp;rdquo;&lt;/p&gt;&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;You could even extend the functionality of this &amp;ldquo;Query Bank&amp;rdquo; and train an LLM, or perform code reviews, or implement search, etc&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Sweet, now anyone can grab some code and use it! They can paste it here and in this scheduled job there, science can use it and slam it into a notebook, and others can run ad hoc workloads against it.&lt;/p&gt;
&lt;h2 id=&#34;no-ahh-the-bank-is-closed&#34;&gt;No, ahh! The BANK IS CLOSED&lt;/h2&gt;
&lt;p&gt;A query bank misses the entire point of DRY, you end up with a hairball of copy/paste code that has no reference to its origin.&lt;/p&gt;
&lt;p&gt;Imagine we took all package management and said you know what instead of depending on our source code you should just &lt;code&gt;copy and paste it into your project&lt;/code&gt; and reference it there!&lt;/p&gt;
&lt;p&gt;That is what the &amp;ldquo;Query Bank&amp;rdquo; is doing because &lt;code&gt;you can&#39;t inject the latest definition(query) at run time&lt;/code&gt;. If you &lt;strong&gt;CAN&lt;/strong&gt; do this then you don&amp;rsquo;t have a query bank, you have an &lt;strong&gt;actual data model&lt;/strong&gt;, fleshed out with materialized relations that have some established dependency chain which you can reference in your workflow.&lt;/p&gt;
&lt;p&gt;The query bank is just a bandaid for broken process/data models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No lineage of where code has been copied&lt;/li&gt;
&lt;li&gt;No ability to update code across all copies&lt;/li&gt;
&lt;li&gt;Limited ability to notify users of changes to upstream code&lt;/li&gt;
&lt;li&gt;Most likely repeating ourselves which is extra work for people and DBMS&lt;/li&gt;
&lt;li&gt;Will 100% cause discrepancies between &amp;ldquo;Sources of Truth&amp;rdquo;&lt;/li&gt;
&lt;li&gt;No ability to ensure code is being copy/pasted into the correct context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Creating tables establishes consistency, durability and determinism that&amp;rsquo;s not possible with a &amp;ldquo;Query bank&amp;rdquo;.&lt;/p&gt;
&lt;h2 id=&#34;the-bank-is-open&#34;&gt;The Bank is open&lt;/h2&gt;
&lt;p&gt;Make a table, document the table, expose the table, manage access, and share the table.&lt;/p&gt;
&lt;p&gt;Ask yourself a few questions before you act:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Should I really send this code to someone or should I just DEFINE this once and allow them to read from the definition?&lt;/li&gt;
&lt;li&gt;This job, model, notebook whatever you want to call it is already like 1000 lines of SQL should I really pile on more logic to this or maybe its time ot break down into a series of relations?&lt;/li&gt;
&lt;li&gt;I&amp;rsquo;ve seen this Temporary table, CTE, etc&amp;hellip; in a bunch of people&amp;rsquo;s queries should I just paste it into mine or is there something better I can do?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;its-your-problem-until-you-fix-it-or-find-the-owner&#34;&gt;It&amp;rsquo;s your problem until you fix it or find the owner&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s a challenge as an example. If you see a long case statement being shared around. Yes, literally just a case statement without any context&amp;hellip; take the initiative and see if you can&amp;rsquo;t centralize that and store it is a relation for the people needing it.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s most likely they don&amp;rsquo;t know any better and will continue to do this until they have a nasty data quality issue/argument.&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>Table Creation</title>
      <link>/posts/creating_tables/</link>
      <pubDate>Mon, 17 Mar 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/creating_tables/</guid>
      <description>&lt;h3 id=&#34;collection-of-thoughts-on-table-creation&#34;&gt;Collection of thoughts on table creation.&lt;/h3&gt;
&lt;p&gt;This is specifically regarding tables for analytics (OLAP queries, Data Warehouse, etc&amp;hellip;). This is also specific to a larger organization, I don&amp;rsquo;t necessary apply these when working locally with duckDB per se.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Get a good understanding of who&amp;rsquo;s going to use the table, what they need, and when they&amp;rsquo;ll need that data by. This will help inform your data model and manage dependencies. If you are working directly with another team have them &lt;strong&gt;list out&lt;/strong&gt; all of the requirements and set a date when this list will be final.&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;
&lt;p&gt;When working on a request for our finance partners, I was starting on it and everyday they wanted to add this, change that&amp;hellip;. nip that in the bud&amp;hellip; have them curate what they want with the expectation that it will lock once you start building it. This will force them to stop, think critically, and figure out exactly what they need. I have only had to make significant updates to this table once in the last 14 months because of this. (I guess I can&amp;rsquo;t prove causation, but I think it helped)&lt;/p&gt;</description>
      <content>&lt;h3 id=&#34;collection-of-thoughts-on-table-creation&#34;&gt;Collection of thoughts on table creation.&lt;/h3&gt;
&lt;p&gt;This is specifically regarding tables for analytics (OLAP queries, Data Warehouse, etc&amp;hellip;). This is also specific to a larger organization, I don&amp;rsquo;t necessary apply these when working locally with duckDB per se.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Get a good understanding of who&amp;rsquo;s going to use the table, what they need, and when they&amp;rsquo;ll need that data by. This will help inform your data model and manage dependencies. If you are working directly with another team have them &lt;strong&gt;list out&lt;/strong&gt; all of the requirements and set a date when this list will be final.&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;
&lt;p&gt;When working on a request for our finance partners, I was starting on it and everyday they wanted to add this, change that&amp;hellip;. nip that in the bud&amp;hellip; have them curate what they want with the expectation that it will lock once you start building it. This will force them to stop, think critically, and figure out exactly what they need. I have only had to make significant updates to this table once in the last 14 months because of this. (I guess I can&amp;rsquo;t prove causation, but I think it helped)&lt;/p&gt;&lt;/blockquote&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;
&lt;p&gt;Don&amp;rsquo;t try and do too much in a single relation, you may think you are doing people favors by adding columns, nested data structures, and complexity but there&amp;rsquo;s a delicate balance here. A classic mistake is building extremely &lt;a href=&#34;https://selectfromwhereand.com/posts/widetables/&#34;&gt;wide tables&lt;/a&gt;. These tables are for everybody and nobody because each aspect of the table wont be &amp;ldquo;good enough&amp;rdquo;. The goal is commonly to try and bring in hundreds of attributes at a specific granularity, but it turns into a dependency and data quality nightmare without being able to satisfy many questions without making comprises that nobody can accept.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Document your stuff. I&amp;rsquo;m a firm believer that what you build is only as good as your documentation. Don&amp;rsquo;t continue to contribute to the sea of blank column descriptions and opaque DAGs that your stakeholder/peers will have to waste time figuring out. You will also waste your own time explaining things to your stakeholders over and over. Instead, create a thorough explanatory resource and point people to it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Publish it to a shared data catalog if possible&amp;hellip; I&amp;rsquo;ve made this mistake countless times of creating local tables, them gaining a solid user base and then having to later share it and manage access across various people/software entities. A few caveats in that I would not do this if you&amp;rsquo;re still developing the table and making fast iterations&amp;hellip; but otherwise it&amp;rsquo;s going to give you the ability to manage access, control updates, configure dependencies, and cascade relevant notifications so much easier.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Write tests and perform code reviews. We are working on better testing frameworks to role out.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;The last thing you want is people/down stream jobs consuming trash that could&amp;rsquo;ve been blocked from being written to the table in the first place.&lt;/li&gt;
&lt;li&gt;Having someone review your code/table is a no-brainer. Writing SQL against the table and using it as it would be used in production will uncover oddities/issues that would otherwise plague consumer.&lt;/li&gt;
&lt;/ul&gt;
</content>
    </item>
    
    <item>
      <title>Stop building on top of bad decisions</title>
      <link>/posts/stop/</link>
      <pubDate>Thu, 16 Jan 2025 00:00:00 +0000</pubDate>
      
      <guid>/posts/stop/</guid>
      <description>&lt;h1 id=&#34;one-step-back-two-steps-forward&#34;&gt;One step back, two steps forward:&lt;/h1&gt;
&lt;p&gt;Stop building on top off bad decisions and fix the bad decisions.&lt;/p&gt;
&lt;p&gt;Lets say you work with a bunch of people that are setting weekly goals. Each year  goals are appended in a particular format to a table referenced by everyone to present goals throughout the organization.&lt;/p&gt;
&lt;p&gt;Long long ago in a galaxy far far away someone decided set the goal date column as type &lt;code&gt;VARCHAR()&lt;/code&gt; so now you have goals associated with strings that range from &amp;lsquo;2025-05&amp;rsquo; to &amp;lsquo;2025-02-05&amp;rsquo; to &amp;lsquo;02/05/2025&amp;rsquo;. In fact for a single goal, the weeks may be recorded in wildly different formats each year based on whoever uploads them.&lt;/p&gt;</description>
      <content>&lt;h1 id=&#34;one-step-back-two-steps-forward&#34;&gt;One step back, two steps forward:&lt;/h1&gt;
&lt;p&gt;Stop building on top off bad decisions and fix the bad decisions.&lt;/p&gt;
&lt;p&gt;Lets say you work with a bunch of people that are setting weekly goals. Each year  goals are appended in a particular format to a table referenced by everyone to present goals throughout the organization.&lt;/p&gt;
&lt;p&gt;Long long ago in a galaxy far far away someone decided set the goal date column as type &lt;code&gt;VARCHAR()&lt;/code&gt; so now you have goals associated with strings that range from &amp;lsquo;2025-05&amp;rsquo; to &amp;lsquo;2025-02-05&amp;rsquo; to &amp;lsquo;02/05/2025&amp;rsquo;. In fact for a single goal, the weeks may be recorded in wildly different formats each year based on whoever uploads them.&lt;/p&gt;
&lt;p&gt;THIS IS A BAD PROCESS.&lt;/p&gt;
&lt;p&gt;This should be screaming bad process, as you disentangle different week to week formatting changes each year or &lt;strong&gt;worse&lt;/strong&gt; what if someone updates it in the middle of the year and your logic just suddenly breaks&amp;hellip; Your best way to test that logic in the first place would have been to establish a proper schema and not use a &lt;code&gt;VARCHAR&lt;/code&gt; to store a date, but we&amp;rsquo;ve absolutely blown the rails off that one.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; TEMP &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; TEST123 &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;2024-12-05&amp;#39;&lt;/span&gt;::DATE &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; SHIP_DATE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- ERROR: invalid input syntax for type date: &amp;#34;2024-49&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Yes! This is what we want
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; TEST123 &lt;span style=&#34;color:#66d9ef&#34;&gt;VALUES&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;2024-49&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Stop putting band-aids on and fix it&amp;hellip; that&amp;rsquo;s the only way this is going to get better. End of story. &lt;em&gt;But there&amp;rsquo;s a lot of important dependencies on this table we&amp;rsquo;d have to change a lot of logic&amp;hellip; its a big risk&lt;/em&gt; There’s a &lt;code&gt;much bigger risk&lt;/code&gt; in parsing semi-random strings to match dates then making it right.&lt;/p&gt;
&lt;p&gt;Change your automatic response from &lt;del&gt;&amp;ldquo;What do I need to do to make this work&amp;rdquo;&lt;/del&gt; to &lt;del&gt;&amp;ldquo;How do I fix this and make this better for everyone moving forward&amp;rdquo;&lt;/del&gt;. Everyone will benefit and your organization will be more flexible and testable.&lt;/p&gt;
&lt;p&gt;Flexible and testable, good qualities to live by.&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>DRY Analytics</title>
      <link>/posts/dry_analytics/</link>
      <pubDate>Sun, 24 Nov 2024 00:00:00 +0000</pubDate>
      
      <guid>/posts/dry_analytics/</guid>
      <description>&lt;p&gt;I was talking to one of our partner teams in a developing market. They are a newer analytics department of savvy Business Analysts &amp;amp; SCMs (Supply Chain Managers).&lt;/p&gt;
&lt;p&gt;Long story short, they are working on this new metric and I was like, How are you filtering for our specific products in your marketplace? And they were like: Oh we have this case statement that we include in all of our jobs…&lt;/p&gt;</description>
      <content>&lt;p&gt;I was talking to one of our partner teams in a developing market. They are a newer analytics department of savvy Business Analysts &amp;amp; SCMs (Supply Chain Managers).&lt;/p&gt;
&lt;p&gt;Long story short, they are working on this new metric and I was like, How are you filtering for our specific products in your marketplace? And they were like: Oh we have this case statement that we include in all of our jobs…&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT COl1
     , COL2
     , COL3
     , ...
     , CASE WHEN PRODUCT_WEIGHT &amp;gt; 45 THEN &amp;#39;OUR_PRODUCT&amp;#39;
            WHEN PRODUCT_LENGTH &amp;gt;= 37 AND PRODUCT_WIDTH &amp;gt;= 33 THEN &amp;#39;OUR_PRODUCT&amp;#39;
            WHEN CATEGORY = &amp;#39;SOME_CATEGORY&amp;#39; OR EARTH_IS_ROUND_FLAG = &amp;#39;Y&amp;#39; THEN &amp;#39;OUR_PRODUCT&amp;#39;
          WHEN COLOR = &amp;#39;RED&amp;#39; THEN &amp;#39;OUR_PRODUCT&amp;#39;
          ELSE &amp;#39;NOT_OUR_PRODUCT&amp;#39; END AS PRODUCT_DEF
FROM BIG_TABLE
WHERE PRDDUCT_DEF = &amp;#39;OUR_PRODUCT&amp;#39;
&lt;/code&gt;&lt;/pre&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Quick aside&lt;/strong&gt;&lt;/em&gt;:  When I write Job, Profile, Model, or Asset I’m talking about SQL that creates some result set and is most likely scheduled to run at a specified interval or once all dependencies have satisfied.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I was like ohhhh man can I save you some pain. Maintaining this kind of logic in tens or worse hundreds of jobs is going to inevitably suck! At some point someone’s going to come along and say:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; Hey… color = &#39;BLUE&#39; products should also be in our list of products.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now when this day comes and you’ve got a bunch of different teams that all maintain product definitions… someone is going to have to coordinate this migration and get the timing right. Heat flashes across your forehead and your palms get a little sweaty. For so long we&amp;rsquo;ve been copying and pasting case statements but we&amp;rsquo;ve officially reached a crescendo of misaligned definitions and numbers not matching up. The CAT is way out of the bag on this one, it does NOT want to go back in the bag and you need it back in the bag if you ever want to make it to the &lt;a href=&#34;https://fifeweb.org/world-winner/&#34;&gt;FIFe World Show&lt;/a&gt;.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;This change will need to be cascaded across all the teams that maintain such a definition&lt;/p&gt;
&lt;p&gt;&lt;code&gt;a.&lt;/code&gt;	Some teams might disagree and say Blue products shouldn’t be included&lt;/p&gt;
&lt;p&gt;&lt;code&gt;b.&lt;/code&gt;	It may be difficult to locate everyone who is defining products so there will be some level of uncertainty that everyone has been notified&lt;/p&gt;
&lt;p&gt;&lt;code&gt;c.&lt;/code&gt;	And still others will ask, since when did we define OUR_PRODUCT with a color dimension????&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Somebody won’t have bandwidth to implement the change and will need to push the date back, others may need time to figure out where exactly they are defining product definitions in their pipeline, someone’s OOTO. This will bump around a bunch of teams and there’ll be several calls to organize.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Some teams may have different data models, we only keep the latest definition, we keep a versioned definition over time, we have a denormalized table and &lt;code&gt;qualify row_number() over(partition by product_id order by last_updated_date desc) = 1&lt;/code&gt;  every single time to derive the most up to date definition.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Now that we’ve updated the definition do we need to rewrite all historical tables that were derived using the previous definition? Now those blue products are OUR_PRODUCT but at any point time pre-change they would never be OUR_PRODUCT and would not reside in any stored result in any relation. And don&amp;rsquo;t forget this is happening across multiple teams in parallel and that change will need to be cascaded through several sequential jobs!&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;ok-so-how-do-we-avoid-pure-pain&#34;&gt;Ok so how do we avoid pure pain?&lt;/h3&gt;
&lt;p&gt;Maintain a centralized definition.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;DENORM_PRODUCT_DEFS&lt;/code&gt; [Captures all product changes over time] denormalized table appending all changes to products over time. Our products may change color or a vendor my adjust the packaging changing its dimensions this can all impact the classification of a specific product. This table could hold when the records were first inserted, when they were most recently updated, and denotes the active definition. This will save you a bunch of LEAD/LAG stuff if you’re reading from this table a lot.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;VERSIONED_PRODUCT_DEFS&lt;/code&gt; [Tracks historical versions for auditing and reproducability] versioned table that contains a snapshot of the product definition at a specific point in time. Back in JUNE we used this definition which would have resulted in this classification of these products and now in OCTOBER we use this definition which would have resulted in this classification of these products. This table would nicely be partitioned on version and would also be great to have columns denoting start and end dates for the version.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;FACT_PRODUCT_DEFS&lt;/code&gt; [Current, streamline view for active use cases] only the most up to date definition for products that is pre-filtered to only include your products.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So now we have profiles that look like:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT B.COl1
     , B.COL2
     , B.COL3
     , B...
FROM BIG_TABLE B
    INNER JOIN FACT_PRODUCT_DEFS F ON B.PRODUCT_ID = F.PRODUCT_ID
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And even better, now we can update how this reference is configured and if we have this query written fifty times everything points back to that reference and is updated [ref dynamically resolves table and schema names, so changes to a single reference propagate across all queries]:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT B.COl1
     , B.COL2
     , B.COL3
     , B...
FROM {{ ref(&amp;#34;BIG_TABLE&amp;#34;) }} B
    INNER JOIN {{ ref(&amp;#34;FACT_PRODUCT_DEFS&amp;#34;) }} F ON B.PRODUCT_ID = F.PRODUCT_ID
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Distinct logic across teams is pain. Host a central definition. Argue about said definition, update said definition, cry about said definition, whatever… but allow everyone to read from a single definition.&lt;/p&gt;
&lt;p&gt;Any time you see the same code written over and over alarm bells should be going off, there is a fire in the kitchen! Not only is this just MORE lines of code, but every instance its re-written is another opportunity for logic to become stale or be slightly miswritten. DRY DRY DRY. Do not &lt;a href=&#34;https://youtu.be/ftSf-T9Mins?si=bV924-ee6LTPMUAQ&amp;amp;t=68&#34;&gt;pour water&lt;/a&gt; on the grease fire in the kitchen, suffocate it with a deluge consistent data/reporting instead.&lt;/p&gt;
&lt;p&gt;This is such a common source of the “This number doesn’t match that number” problem. This is the “&lt;a href=&#34;https://news.ycombinator.com/item?id=36885598&#34;&gt;but it works on my machine&lt;/a&gt;” equivalent for day to day analytics.&lt;/p&gt;
&lt;p&gt;The numbers don’t match up because distributed hard coded definitions are hosted across teams using different dependencies running at different times… I would be FAR more perplexed if the numbers DID match up. In fact I’d be completely dumb founded. The numbers have absolutely no business to be matching up because they don’t have a data model/governance to support them actually doing so. Its not even automatically testable because there’s no way to determine where the code diverges without, I guess, literally parsing jobs as text to rip out the dependencies and business logic.&lt;/p&gt;
&lt;p&gt;There are so many different “types” of what I described above. For instance, possibly you have a data migration campaign and need to prefix all of your jobs with a new Database. In current state, hundreds of profiles have FROM clauses like &lt;code&gt;FROM SCHEMA.TABLE&lt;/code&gt; but now needs to be &lt;code&gt;FROM DATABASE.SCHEMA.TABLE&lt;/code&gt; to every single one… hopefully this illustrates why the above idea of &lt;code&gt;ref&lt;/code&gt; is so helpful.&lt;/p&gt;
&lt;p&gt;I’m not here to praise DBT but the Jinja templating &lt;a href=&#34;https://docs.getdbt.com/guides/using-jinja?step=1&#34;&gt;solution&lt;/a&gt; to make SQL more dynamic handles some of these problems nicely. I think a better data model can resolve some nasty issues, but if you’re repeating logic or referencing specific relations over and over across profiles, assets, whatever… be prepared for pain.&lt;/p&gt;
&lt;p&gt;If you’re working with an analytics team, one of the worst things that can happen is an erosion of trust in data quality from that team’s stakeholders. If you ordered crayons from Amazon and you receive pencils, or you set up an AWS billing alarm to send you an email when your spend has exceeded $100 and it doesn’t… you lose trust. One of the key pillars of a successful analytics team is building trust in their outputs (Also, move quickly I need this data yesterday lets go lets go&amp;hellip; but that’s a given).&lt;/p&gt;
&lt;p&gt;This is the unglamorous plumbing that underpins a successful analytics organization. While you can never fully control the madness that occurs in a 30 tab excel file authored by ten people off of windows shared drive you can rest easy knowing that at least whatever was copy pasted in there was reputable and “correct”.&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>Wide Tables</title>
      <link>/posts/widetables/</link>
      <pubDate>Sun, 06 Oct 2024 00:00:00 +0000</pubDate>
      
      <guid>/posts/widetables/</guid>
      <description>&lt;h1 id=&#34;a-few-considerations-before-building-that-extremely-wide-table&#34;&gt;A few considerations before building that extremely wide table.&lt;/h1&gt;
&lt;p&gt;Every once in a while, an idea comes along to build a table with a wide array of columns at a particular granularity. This is usually associated with “self-serve analytics” and/or an attempt to define a “source of truth” table. Wide tables, self-serve analytics, and source of truth tables aren’t inherently good or bad… but there are some considerations to this type of table that I’m going to rant about.&lt;/p&gt;</description>
      <content>&lt;h1 id=&#34;a-few-considerations-before-building-that-extremely-wide-table&#34;&gt;A few considerations before building that extremely wide table.&lt;/h1&gt;
&lt;p&gt;Every once in a while, an idea comes along to build a table with a wide array of columns at a particular granularity. This is usually associated with “self-serve analytics” and/or an attempt to define a “source of truth” table. Wide tables, self-serve analytics, and source of truth tables aren’t inherently good or bad… but there are some considerations to this type of table that I’m going to rant about.&lt;/p&gt;
&lt;p&gt;When I talk about a single “wide table” I’m kind of talking about a &lt;a href=&#34;https://en.wikipedia.org/wiki/Denormalization&#34;&gt;denormalized table&lt;/a&gt; just an especially wide one with many loosely related columns. The opposite of this would be a &lt;a href=&#34;https://en.wikipedia.org/wiki/Database_normalization&#34;&gt;normalized design&lt;/a&gt; where one would “store different but related pieces of information in separate logical tables (called relations)”. I don’t want to linger on this topic too much but typically a denormalized scheme speeds up reads at the expense of writes, and vice versa for a normalized schema.&lt;/p&gt;
&lt;p&gt;I’ll also mostly be talking in the context of &lt;a href=&#34;https://en.wikipedia.org/wiki/Online_analytical_processing&#34;&gt;OLAP&lt;/a&gt;, not &lt;a href=&#34;https://en.wikipedia.org/wiki/Online_transaction_processing&#34;&gt;OLTP&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;lets-define-our-wide-table&#34;&gt;Lets define our wide table&lt;/h2&gt;
&lt;p&gt;For the sake of the rest of this thing lets define a wide table.&lt;/p&gt;
&lt;p&gt;You work for a company that sells things. You record data on what was sold, how much was sold, what was projected to be sold, how much of that thing you had, how many distinct customers bought that thing, how quickly did you get that thing to your customer… I’m not going to hash out the full DDL for this table but you get the picture.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It should be apparent the sheer number of unique metrics that can be generated just from selling things!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You know what would be really nice? What if we decided on some granularity say… for a particular thing in a particular week we just record every metric about that thing for that week! So, a unique row in our table will be a combination of Thing + Week… and then we’ll store the name of the thing, the inventory, the shipments, the forecast, etc…&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Note&lt;/code&gt; that this is a BATCH solution where data is written periodically at specific intervals in time&amp;hellip; NOT a streaming solution where shipments are recorded in real time. Maybe the upstream sources this table is generated by are populated via some Kafka/Redpanda/Kinesis situation.&lt;/p&gt;
&lt;h2 id=&#34;why-would-you-want-to-do-this&#34;&gt;Why would you want to do this?&lt;/h2&gt;
&lt;p&gt;Maybe this table is for an LLM or ML based query generation solution, pulling data for non-technical users with ad-hoc business questions, maybe it’s to simplify the SQL associated with these ad-hoc queries to move joins/other logic upstream, speeding up reads (SELECT), or maybe the table needs to have a ton of columns for some other reason. That’s fine! &lt;code&gt;Standardizing definitions is generally a good thing and improving the speed at which ad-hoc questions are answered is also good!&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Dueling parties wielding Excel files generated by random queries stored and iterated on in Notepad, resulting in different outputs is always annoying. It’d be easier to trouble shoot if everyone was relying on the same source and using some form of version control (I know I’m only dreaming at this point). It’d be even better if you didn’t need to join any other relations on that source, simply &lt;code&gt;SELECT &amp;amp; FILTER&lt;/code&gt; and you’re good to go, thus, our wide table.&lt;/p&gt;
&lt;h1 id=&#34;what-are-some-issues-begin-rant&#34;&gt;What are some issues? [Begin Rant]&lt;/h1&gt;
&lt;h2 id=&#34;this-table-is-now-a-dependency-bomb&#34;&gt;This table is now a dependency bomb.&lt;/h2&gt;
&lt;p&gt;All of these columns are derived from upstream tables, your inventory data is coming from scan events at your warehouse, outbound shipments from another source and data science is producing the forecast in a local Python Notebook triggered via CRON that writes to the windows shared drive, you’ve asked if this could be moved to sage maker and the resulting dataset cataloged and directly written to the data-warehouse but that’s &amp;ldquo;below the line&amp;rdquo; for this quarter.&lt;/p&gt;
&lt;p&gt;You could very well now have a &lt;code&gt;ball of 70 dependencies&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If any of those dependencies are delayed every metric will be waiting… so instead of just having the guy who needs to report inventory levels to the director on Monday pissed, everyone is pissed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now you’re like ok… if any dependency hasn’t satisfied by Monday at 5:00 am, it doesn’t matter I’m just going to force the job so that at least some people get their data. However, now you’ve just written dirty rows to this table… the inventory guy is still pissed because inventory is half of what it should be because you forced the data set before the latest scan events were written to the upstream table. You also KNOW that by forcing dependencies you’ve got incorrect data in your table and you’re just going to have to re-run your transform job again and actually let the dependencies satisfy this time. If you have a single transform job that means running everything again on your compute even if it&amp;rsquo;s already been written to the table successfully. You’ll also need some mechanism to notify the inventory guy that his numbers are going to be garbage Monday am, so he doesn’t accidentally report something ridiculous to the director.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Inventory guy does not like you&lt;/code&gt; he thinks you’re an ass, you think he’s an ass, inventory guy keys the right side of your 2022 Toyota Avalon, the Toyota Avalon is now discontinued and you’re sad because you should’ve waited and purchased the Toyota Crown, but you didn’t realize the Crown was coming out so you bought an Avalon, anyway… right hook to inventory guy! Now you’re being charged with assault! Now you’re in jail, your arraignment isn’t until Tuesday, but you need to get inventory numbers out by Monday! Who’s going to force those dependencies???&lt;/p&gt;
&lt;h2 id=&#34;what-about-storage&#34;&gt;What about storage?&lt;/h2&gt;
&lt;p&gt;With this wide table, we’ll have the name of the product, maybe the product belongs to a broader group of products and we’ll throw the description of the product in there as well.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;These facts about the product may change… but probably fairly infrequently.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In our table every week the same values will be included for those products over and over. Instead, they could be stored in a separate table that contains each product, the description, color, etc… and then that could be joined to our fact table. Now we are again beginning to talk about &lt;a href=&#34;https://en.wikipedia.org/wiki/Database_normalization&#34;&gt;Normalization&lt;/a&gt; which is critical to think about when setting up your relations.&lt;/p&gt;
&lt;p&gt;A few solid rebuttals to the above statement given that we are mostly talking about an OLAP situation:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Modern columnar storage formats like Parquet are efficient at compressing data, especially repeated values in a column… something like &lt;a href=&#34;https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Database-Design/Using-Data-Compression/Multivalue-Compression&#34;&gt;MVC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Data storage is relatively cheap… if we are only talking in the millions of rows, it&amp;rsquo;s just not going to be a major driver of cost&lt;/li&gt;
&lt;li&gt;Denormalized data often performs better for analytical queries, with columnar storage, queries will only need to scan the columns as denoted by the query&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;a-few-points-on-insertsupdatesdeletes&#34;&gt;A few points on Inserts/Updates/Deletes&lt;/h2&gt;
&lt;p&gt;This is where denormalization can be slower and get more complicated. Let’s say we want to change the product group that one of our products belongs to in the denormalized table. We’ll want all historical data to be attributed to the new group so we’ll need to update the previous product group in every row.&lt;/p&gt;
&lt;p&gt;We can also launch a new product, but it’s possible data science hasn’t finished the production forecast for that product yet… now the forecast column is NULL because &lt;code&gt;we aren’t going to stall inserting the rest of the columns for that product just because the forecast is NULL.&lt;/code&gt; However, people that read from this table will need to understand that the forecast is NULL and choose how to deal with it accordingly. If an analyst is tasked with deriving forecast accuracy, and they notice some NULLs and just &lt;code&gt;NVL(forecast_column, 0)&lt;/code&gt;… that’s just not true, science has a beta forecast for the product and its certainly &amp;gt; 0… there’s a reason we decided to launch the new product after all, we predicted customers will want to buy it!&lt;/p&gt;
&lt;p&gt;To be fair, if you launch something like the &lt;a href=&#34;https://en.wikipedia.org/wiki/Pontiac_Aztek&#34;&gt;Pontiac Aztek&lt;/a&gt; maybe anticipating 0 demand is the correct choice when attempting to sell an angry kitchen appliance of a car.&lt;/p&gt;
&lt;h1 id=&#34;where-will-your-data-model-break-down&#34;&gt;Where will your data model break down?&lt;/h1&gt;
&lt;p&gt;Q: Hey I know we have that inventory column, could we break it out into the different inventory dispositions (Sellable, In transit, and Damaged)?&lt;/p&gt;
&lt;p&gt;A: &lt;code&gt;Sure, now our table increases in... width...&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Q: Hey I know we have shipments for all of last week, and its only Thursday but I’d really to see what we shipped out Mon/Tues/Wed of this week to see where we are at.&lt;/p&gt;
&lt;p&gt;A: &lt;code&gt;We cant really answer that with the demoralized table at the current granularity… even if it was daily we’d probably be waiting on dependencies and wouldn&#39;t be able to get back to this person until Friday. Now this ask simply becomes a query against the upstream shipments table for the latest data.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Q: Hey I’m using that wide table you made but I want to see how the descriptions for this product have evolved over time; however, all the descriptions are the same in this table?&lt;/p&gt;
&lt;p&gt;A: &lt;code&gt;This question will need to be directed towards another table that contains this info as the denormalized table has been updated to only reflect the latest definition.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Just as with anything there are tradeoffs. Above, I&amp;rsquo;m posing specifically tricky questions of the table, but it would do extremly well with basic aggregations and reducing the knowledge burden for end users of all the different upstream tables.&lt;/p&gt;
&lt;p&gt;I only bring up these points because I think they are relevant to think about when designing how you want to capture your data. Again, there is no right or wrong, it’s simply context/data dependent and the right choice will be different across different teams. However, there is a clear right and wrong when it comes purchasing a &lt;code&gt;Pontiac Aztek&lt;/code&gt;; I&amp;rsquo;ll leave that exercise to the reader.&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>[6 / 4 ~~= 1~~ = 1.5]: Type Coercion and Precision in SQL</title>
      <link>/posts/sql_types/</link>
      <pubDate>Sat, 06 Jul 2024 00:00:00 +0000</pubDate>
      
      <guid>/posts/sql_types/</guid>
      <description>&lt;h1 id=&#34;an-overview-of-whats-to-come&#34;&gt;An overview of what&amp;rsquo;s to come:&lt;/h1&gt;
&lt;p&gt;I was originally writing this as a high-level introduction to send people when they encounter things like &lt;code&gt;6/4&lt;/code&gt; returning &lt;code&gt;1&lt;/code&gt; instead of &lt;code&gt;1.5&lt;/code&gt; in some Database Management Systems (DBMS)&amp;hellip; overtime its sprawled far beyond that. The points below serve as a TLDR, with a sprawling discussion expanding afterward.&lt;/p&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Operations like division in SQL can be confusing for novices or anyone without a programming background&lt;/p&gt;</description>
      <content>&lt;h1 id=&#34;an-overview-of-whats-to-come&#34;&gt;An overview of what&amp;rsquo;s to come:&lt;/h1&gt;
&lt;p&gt;I was originally writing this as a high-level introduction to send people when they encounter things like &lt;code&gt;6/4&lt;/code&gt; returning &lt;code&gt;1&lt;/code&gt; instead of &lt;code&gt;1.5&lt;/code&gt; in some Database Management Systems (DBMS)&amp;hellip; overtime its sprawled far beyond that. The points below serve as a TLDR, with a sprawling discussion expanding afterward.&lt;/p&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Operations like division in SQL can be confusing for novices or anyone without a programming background&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In some DBMS, 6/4 may equal 1 because, dividing two integers a.k.a. &lt;a href=&#34;https://mathworld.wolfram.com/IntegerDivision.html&#34;&gt;integer division&lt;/a&gt; returns an integer (the remainder is discarded)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; Casting one of the operands as a float or numeric/decimal will ensure floating point
 division. Be careful with the precision needed in your calculation as there are
 tradeoffs to using certain types.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Syntax and behavior will be &lt;code&gt;different&lt;/code&gt; for various DBMS, but if you want 6 / 4 to equal 1.5 and not 1 then you can use something like:&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;PostgreSQL:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;::FLOAT &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Type casting using &amp;#34;::&amp;#34; operator
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;::FLOAT &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Casting the divisor
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CAST&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Using the CAST function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;MySQL:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- MySQL automatically performs type conversion
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CAST&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; DECIMAL(&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;)) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Explicit CAST to DECIMAL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CONVERT&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;, DECIMAL(&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;)) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Using the CONVERT function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;SQLServer:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Implicit conversion using decimal point
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CAST&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; FLOAT) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Explicit CAST to FLOAT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CONVERT&lt;/span&gt;(FLOAT, &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Using the CONVERT function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Oracle:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; DUAL &lt;span style=&#34;color:#75715e&#34;&gt;-- Automatic type conversion
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; TO_NUMBER(&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; DUAL &lt;span style=&#34;color:#75715e&#34;&gt;-- Using the TO_NUMBER function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;d &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; DUAL &lt;span style=&#34;color:#75715e&#34;&gt;-- Explicit type declaration
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;SQLite:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CAST&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; REAL) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- CAST to REAL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Implicit conversion using decimal point
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;   &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; ROUND(&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;result&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- Throwing ROUND in there
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Methods may have trade-offs in terms of precision, performance, and portability.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- The choice depends on your specific requirements and the DBMS you&amp;#39;re using.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is NOT an exhaustive list of solutions/syntax for all DBMS as &lt;a href=&#34;https://en.wikipedia.org/wiki/List_of_relational_database_management_systems&#34;&gt;there are many DBMS&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The idea here is that if you desire a specific output, explicitly casting an operand(s) is the safe bet. If you are the data producer, understanding the consumption patterns and applying the correct type will be critical for your consumers downstream&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&amp;rsquo;s crucial to understand the database management system (DBMS) and environment you&amp;rsquo;re working in before curating datasets, creating models, or exporting data for reporting&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;I think that&amp;rsquo;s a fair synopsis of what I&amp;rsquo;ve written below. If you&amp;rsquo;re interested, I&amp;rsquo;ll continue to cover division across a variety of DBMS, the behavior of data types/operations in SQL, Hotdogs, Docker, rug pulls and &lt;a href=&#34;https://www.youtube.com/shorts/lsFplEkBZkE&#34;&gt;BIG DOGS!&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;observations-about-sql-writers&#34;&gt;Observations about SQL Writers&lt;/h1&gt;
&lt;p&gt;Division is a frequent source of logical and syntactical mistakes for novice SQL users. SQL has some round edges for non-technical users like &lt;a href=&#34;https://discourse.julialang.org/t/whats-the-big-deal-0-vs-1-based-indexing/1102/4&#34;&gt;1 based indexing&lt;/a&gt;. 1 based indexing is the SQL standard, and its more straight forward for people who would typically begin counting at 1 and have no familiarity with 0 based indexing found in languages like Python, Java, or C.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Division is more of a sharp edge.&lt;/strong&gt; Not because it’s difficult to do division, but when you take someone who has always done &lt;code&gt;=6/4&lt;/code&gt; in Excel and received a result of &lt;code&gt;1.5&lt;/code&gt; and now you show them that it evaluates to &lt;code&gt;1&lt;/code&gt; when they write SQL against a database&amp;hellip; they don’t anticipate that, nor would they have thought to check for it!&lt;/p&gt;
&lt;h1 id=&#34;some-typing-and-coercion-background&#34;&gt;Some Typing and Coercion Background&lt;/h1&gt;
&lt;p&gt;Many users of SQL coming from a “business” background will have encountered different data types in Excel, but they will likely not have given any thought to Static vs Dynamically typed programming languages. I&amp;rsquo;m now going to &lt;del&gt;play with fire&lt;/del&gt; lay out some definitions to provide some context to the previous sentence that are debatable, but I believe they provide a good mental model for SQL.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SQL is typically considered to be &lt;a href=&#34;https://stackoverflow.com/questions/1517582/what-is-the-difference-between-statically-typed-and-dynamically-typed-languages&#34;&gt;Statically typed&lt;/a&gt;, meaning:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Every data item has an associated data type defined at compile-time&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;SQL does have predefined data types for columns in table definitions, which are determined at &amp;ldquo;compile-time&amp;rdquo; (more accurately, at table creation time). However, there are dynamic aspects&amp;hellip; SQL allows for implicit type conversions in many operations, and the actual type checking often occurs at runtime.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SQL is also most properly defined as &lt;a href=&#34;https://en.wikipedia.org/wiki/Strong_and_weak_typing&#34;&gt;Strongly Typed&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Every data item has an associated data type, defining its behavior and allowed usage&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;SQL uses a mix of static and dynamic typing. Data types are associated with column definitions at table creation time, but many type-related decisions and checks occur at runtime. SQL&amp;rsquo;s type system allows for implicit conversions and exhibits some characteristics of both strong and weak typing, with the specific behavior often depending on the particular SQL implementation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Great, so when I think of SQL I&amp;rsquo;ll think Strong and Static, thanks! Well&amp;hellip; no,&lt;/strong&gt; like I mentioned, this behavior depends on the DBMS. &lt;a href=&#34;https://www.sqlite.org/datatype3.html&#34;&gt;SQLite&lt;/a&gt; uses a more general dynamic type system where the datatype of a value is associated with the value itself, not with the column it sits in. This behavior can be modified with the &amp;ldquo;&lt;a href=&#34;https://www.sqlite.org/stricttables.html&#34;&gt;STRICT&lt;/a&gt;&amp;rdquo; table option keyword.&lt;/p&gt;
&lt;p&gt;An in-depth image for visual learners (I&amp;rsquo;m an aspiring artist):
&lt;img src=&#34;/images/SQLite_to_Postgres.png&#34; alt=&#34;DBMS_DIVISION&#34;&gt;&lt;/p&gt;
&lt;p&gt;Ok&amp;hellip; that&amp;rsquo;s cool, but in SQL what happens when I divide an &lt;a href=&#34;https://en.wikipedia.org/wiki/Integer_%28computer_science%29&#34;&gt;integer&lt;/a&gt; by a &lt;a href=&#34;https://computersciencewiki.org/index.php/Float&#34;&gt;float&lt;/a&gt; or what if I evaluate an integer and a &lt;a href=&#34;https://computersciencewiki.org/index.php?title=String&#34;&gt;string&lt;/a&gt; for equality? In a more general sense we are asking: How do DBMS handle &lt;a href=&#34;https://www.postgresql.org/docs/7.3/typeconv.html&#34;&gt;type coercion&lt;/a&gt;?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Type Coercion&lt;/strong&gt;: is an implicit process where the language runtime automatically converts a value from one type to another. This is done to make operations between different data types possible without explicit instructions from the programmer.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Type Conversion&lt;/strong&gt;: is an explicit process where the programmer manually converts a value from one type to another using specific functions or methods. This requires a deliberate action and is often clearer in terms of code readability and intent.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You’ll see that DBMS have extensive &lt;a href=&#34;https://www.postgresql.org/docs/7.3/typeconv.html&#34;&gt;facilities&lt;/a&gt; for evaluating mixed-type expressions. I’m adding the Postgres docs here because they are good, but there are other docs for other DBMS.&lt;/p&gt;
&lt;h1 id=&#34;why-should-i-care-and-how-about-a-few-examples&#34;&gt;Why should I care, and how about a few examples?&lt;/h1&gt;
&lt;p&gt;To make this implicit conversion explicitly clear, lets directly compare two values that are explicitly cast as different types:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SELECT CAST(5 AS INT) = CAST(&#39;5&#39; AS VARCHAR(10))&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Postgres: ERROR: operator does not exist: integer = character&lt;/li&gt;
&lt;li&gt;SQLite: 1&lt;/li&gt;
&lt;li&gt;Redshift: true&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some programming languages are quite strict when doing comparisons or operations on multiple types. Here is an example in &lt;a href=&#34;https://www.rust-lang.org/&#34;&gt;Rust&lt;/a&gt; where someone has &lt;a href=&#34;https://stackoverflow.com/questions/39677410/why-do-i-get-an-error-when-adding-an-integer-to-a-floating-point&#34;&gt;attempted&lt;/a&gt; to add an integer and a float and received an error.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-RUST&#34; data-lang=&#34;RUST&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;fn&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;main&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;let&lt;/span&gt; float &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;5.0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;let&lt;/span&gt; integer &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;let&lt;/span&gt; result &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; float &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; integer;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a6e22e&#34;&gt;println!&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Result: &lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;, result);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;//error[E0277]: cannot add an integer to a float
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;//     let result = float + integer;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;//                          ^ no implementation for `{float} + {integer}`
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol&gt;
&lt;li&gt;Executing &lt;code&gt;SELECT 5.0 + 4&lt;/code&gt; or even &lt;code&gt;SELECT &#39;5&#39; + 4&lt;/code&gt; in most (maybe all?) DBMS returns 9.&lt;/li&gt;
&lt;li&gt;So then what about &lt;code&gt;SELECT CAST(5 AS INT) + CAST(&#39;4&#39; AS VARCHAR(10))&lt;/code&gt;? This still returns 9 in some DBMS but NOT in Postgres:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In Postgres, an undecorated string literal (e.g., &amp;lsquo;4&amp;rsquo;) without an explicit type is treated as an &lt;strong&gt;unknown&lt;/strong&gt; type. This is actually one of a few &lt;a href=&#34;https://www.postgresql.org/docs/current/datatype-pseudo.html&#34;&gt;Pseudo-Types&lt;/a&gt; that allows Postgres to delay type resolution until more context is available. Here is a nice &lt;a href=&#34;https://www.postgresql.org/message-id/183.1302200970%40sss.pgh.pa.us&#34;&gt;response&lt;/a&gt; from &lt;a href=&#34;https://en.wikipedia.org/wiki/Tom_Lane_%28computer_scientist%29&#34;&gt;Tom Lane&lt;/a&gt; posted 13 years ago.&lt;/p&gt;
&lt;p&gt;Using &lt;a href=&#34;https://www.postgresql.org/docs/9.3/functions-info.html&#34;&gt;pg_typeof&lt;/a&gt; we can see that &lt;code&gt;SELECT pg_typeof(CAST(&#39;4&#39; AS VARCHAR(10)));&lt;/code&gt; returns &lt;code&gt;character varying&lt;/code&gt; while &lt;code&gt;SELECT pg_typeof(&#39;4&#39;);&lt;/code&gt; returns unknown. Due to the specific casting of &amp;lsquo;4&amp;rsquo;, the second example errors (&lt;strong&gt;integer + character varying&lt;/strong&gt;) while &lt;code&gt;SELECT &#39;5&#39; + 4&lt;/code&gt; evaluates an &lt;strong&gt;unknown + integer&lt;/strong&gt; which Postgres implicitly coerces to an integer result of 9.&lt;/p&gt;
&lt;p&gt;However, this is where you need to be careful. For instance, Redshift is a fork of Postgres, but&amp;hellip; guess what:
&lt;code&gt;SELECT CAST(5 AS INT) + CAST(&#39;4&#39; AS VARCHAR(10))&lt;/code&gt; returns &amp;lsquo;54&amp;rsquo;. A few other interesting anecdotes from Redshift:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Updates
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; TEMP &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; SOME_TUPLES (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    MY_NUMBER INT,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    MY_WORD VARCHAR(&lt;span style=&#34;color:#ae81ff&#34;&gt;100&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;INSERT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;INTO&lt;/span&gt; SOME_TUPLES(MY_NUMBER, MY_WORD) &lt;span style=&#34;color:#66d9ef&#34;&gt;VALUES&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Hello&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;Howdy&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Redshift has no problem updating the VARCHAR(100) MY_WORD column with a value I explicitly cast as an INT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;UPDATE&lt;/span&gt; SOME_TUPLES &lt;span style=&#34;color:#66d9ef&#34;&gt;SET&lt;/span&gt; MY_WORD &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;::INT
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;WHERE&lt;/span&gt; MY_NUMBER &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- How about a decimal instead? No problem
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;UPDATE&lt;/span&gt; SOME_TUPLES &lt;span style=&#34;color:#66d9ef&#34;&gt;SET&lt;/span&gt; MY_WORD &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;::DECIMAL(&lt;span style=&#34;color:#ae81ff&#34;&gt;38&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;WHERE&lt;/span&gt; MY_NUMBER &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- What about AVG()?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- I see a lot of people use AVG in SQL that actually don&amp;#39;t want the arithmetic mean
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;WITH&lt;/span&gt; CHECK123 &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;6&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; NUM
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;UNION&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;ALL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; NUM
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;UNION&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;ALL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; NUM
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AVG&lt;/span&gt;(NUM) &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CHECK123 &lt;span style=&#34;color:#75715e&#34;&gt;-- Results in 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Oh, Also
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Returns 2 in Postgres/Redshift
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Returns 1 in SQL Server
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;CAST&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;99&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; INTEGER)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h1 id=&#34;ooooook-so-can-we-go-back-to-division-now&#34;&gt;Ooooook so can we go back to Division now?&lt;/h1&gt;
&lt;p&gt;In some DBMS, &lt;code&gt;SELECT 6 / 4&lt;/code&gt;; would result in &lt;strong&gt;1&lt;/strong&gt; due to integer division, other DBMS will still return &lt;strong&gt;1.5&lt;/strong&gt; even after being passed two &amp;ldquo;integers&amp;rdquo; to operate on.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Integer division is the division of two integers resulting in another integer. It discards the fractional part of the division and returns only the whole number quotient&lt;/li&gt;
&lt;li&gt;In Python &lt;code&gt;print(6/4)&lt;/code&gt; returns 1.5 (round edge); however, &lt;code&gt;print(6//4)&lt;/code&gt; would return 1, discarding the remainder. We’ll see later on that DuckDB has similar syntax!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, &lt;code&gt;SELECT 6.0 / 4&lt;/code&gt;; or &lt;code&gt;SELECT CAST(6 AS FLOAT) / 4&lt;/code&gt;; would result in &lt;strong&gt;1.5&lt;/strong&gt; as the DBMS performs implicit type coercion and returns the result to the client as a float.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Floating-point division is an operation involving at least one floating-point number that results in a floating-point (decimal) number. Floating-point division is used when precision is important, and the fractional part of the result is significant.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;/images/DBMS_DIV.png&#34; alt=&#34;DBMS_DIVISION&#34;&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Different DBMS have different defaults for this coercion behavior. Sometimes its: &amp;ldquo;Hey Big Dog looks like you gave me two integers two divide, I&amp;rsquo;m going to assume that the remainder is important to you and not return an integer&amp;rdquo; OR &amp;ldquo;Big DOG! You just served me up two integers and guess what&amp;hellip; you are getting an integer back!&amp;rdquo;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;SQL Standard is for the DBMS to refer to their users as &lt;a href=&#34;https://www.youtube.com/shorts/lsFplEkBZkE&#34;&gt;Big Dog&lt;/a&gt;, just like Tiger.&lt;/p&gt;
&lt;p&gt;I’m only talking about type coercion, but keep in mind that there are A LOT of DBMS and they do A LOT of things &lt;a href=&#34;https://aws.amazon.com/compare/the-difference-between-mysql-vs-postgresql/#:~:text=MySQL%20has%20limited%20support%20of,stored%20procedures%20in%20multiple%20languages.&amp;amp;text=MySQL%20supports%20numeric%2C%20character%2C%20date,spatial%2C%20and%20JSON%20data%20types&#34;&gt;differently&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    Quick side note, in DuckDB there are two division operators: / and //.
    They are equivalent when at least one of the operands is a FLOAT or a DOUBLE.
    When both operands are integers, / performs floating point division (5 / 2 = 2.5)
    while // performs integer division (5 // 2 = 2).
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;On top of this, admin can make &lt;a href=&#34;https://dev.mysql.com/doc/refman/8.4/en/sql-mode.html&#34;&gt;configuration changes&lt;/a&gt; to the DBMS you’re using, possibly making its behavior different than what I describe here! &lt;strong&gt;Developing a sound understanding of the environment you plan to write queries in before you or some text to SQL engine starts spinning off queries is important!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you’d like to compare for yourself, &lt;a href=&#34;https://sqlfiddle.com/&#34;&gt;SQL Fiddle&lt;/a&gt; is a nice online SQL compiler. You can also use &lt;a href=&#34;https://docs.docker.com/get-docker/&#34;&gt;docker&lt;/a&gt;. With docker installed, you can pull the official &lt;a href=&#34;https://hub.docker.com/_/mysql&#34;&gt;MySQL image&lt;/a&gt; and try for yourself:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-BASH&#34; data-lang=&#34;BASH&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Start a new MySQL container named &amp;#39;testsql&amp;#39; with the root password set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# The -d flag runs the container in detached mode (in the background)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ docker run --name testsql -e MYSQL_ROOT_PASSWORD&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;ilovemysql -d mysql:latest
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# List all running Docker containers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# This should show the &amp;#39;testsql&amp;#39; container we just started&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ docker ps
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Start an interactive bash shell inside the running &amp;#39;testsql&amp;#39; container&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# The -it flags allow for an interactive terminal session&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ docker exec -it testsql bash
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Inside the container, start the MySQL client&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# The -u flag specifies the user (root in this case)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# The -p flag prompts for the password we set earlier&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ mysql -u root -p
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Now we&amp;#39;re in the MySQL prompt. We can run SQL commands here, for example:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;SHOW DATABASES;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;SELECT 6/4;  &lt;span style=&#34;color:#75715e&#34;&gt;# This will return 1.5000 in MySQL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# To exit the MySQL client&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;EXIT;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# To exit the container&amp;#39;s bash shell&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ exit
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Stop the &amp;#39;testsql&amp;#39; container&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ docker stop testsql
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# List running containers again to verify &amp;#39;testsql&amp;#39; has stopped&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ docker ps
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# List all containers, including stopped ones&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# This will still show &amp;#39;testsql&amp;#39;, but in a stopped state&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ docker ps -a
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h1 id=&#34;in-practice&#34;&gt;In Practice&lt;/h1&gt;
&lt;p&gt;In my day job, 99% of the time people want floating point division, and I will advise them to explicitly cast one of the operands. For example in Redshift I would write: SELECT 6/4::FLOAT.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A good question might be: “Why bother casting if one or both of my columns is already a Decimal/Float, isn’t it irrelevant?”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In current state you would be 100% correct, but are you 100% sure nobody will alter the table and change that one column that was a Decimal to an INT…&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Hey remember that old column we created as a Decimal type, we only write integer values between +/- 100000. We can save some space by storing it as an integer… nobody will have any problem with that! Sounds good: &lt;code&gt;ALTER TABLE TBLONE ALTER COLUMN COLUMNONE TYPE INTEGER USING (NULLIF(COLUMNONE, ‘’)::INTEGER);&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Hey that table we produce yeah we are actually deleting that column you use, but its ok because you can recalculate it using a JOIN on this other table to bring in another column… make sure it doesn’t break your transform… thanks!
&lt;code&gt;(We sent the email about the migration 3 months ago but must have missed you.. haha sorry lol… well we are making the change on Friday, thanks so much!)&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I typically prefer writing defensive SQL and checking what I’m being defensive against rather than having the data producer &lt;a href=&#34;https://rekt.news/swaprum-rekt/&#34;&gt;pull the rug&lt;/a&gt; on me. Which is why I prefer to be the data producer, and why I prefer not to buy Meme Coins.&lt;/p&gt;
&lt;p&gt;I’ll leave a final furthering thought for anyone interested. I’ve gone ahead and stolen this little exercise from the &lt;a href=&#34;https://dev.mysql.com/doc/refman/8.4/en/precision-math-examples.html&#34;&gt;MySQL docs&lt;/a&gt; and re-written it for Redshift (should work identically in Postgres as well)&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Three variables: i (an integer), d (a decimal with 10 digits total and 4 digits after the decimal point), and f (a floating point).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- The WHILE loop runs 10,000 times, incrementing both d and f by 0.0001
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- After the loop, RAISE INFO prints the values of d and f.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;OR&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;REPLACE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;PROCEDURE&lt;/span&gt; test_precision()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;LANGUAGE&lt;/span&gt; plpgsql
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;DECLARE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    i INT :&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    d DECIMAL(&lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;) :&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    f FLOAT :&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    WHILE i &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10000&lt;/span&gt; LOOP
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        d :&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; d &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;0001&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        f :&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; f &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;.&lt;span style=&#34;color:#ae81ff&#34;&gt;0001&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        i :&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; i &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;END&lt;/span&gt; LOOP;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    RAISE INFO &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;DECIMAL result: %, FLOAT result: %&amp;#39;&lt;/span&gt;, d, f;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;END&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;$$&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Call the stored procedure
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CALL&lt;/span&gt; test_precision();
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here&amp;rsquo;s a &lt;a href=&#34;https://en.wikipedia.org/wiki/IEEE_754&#34;&gt;hint&lt;/a&gt; if you&amp;rsquo;re curious about the output, or you can just smash this stored procedure into ChatGPT and it’ll rattle off a coherent answer &lt;a href=&#34;https://www.youtube.com/watch?v=b7k0a5hYnSI&#34;&gt;the choice is yours&lt;/a&gt;.&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>Temporary Tables</title>
      <link>/posts/redshift_temp_tables/</link>
      <pubDate>Sun, 05 May 2024 00:00:00 +0000</pubDate>
      
      <guid>/posts/redshift_temp_tables/</guid>
      <description>&lt;p&gt;The database management system (DBMS) I’ll reference today is &lt;a href=&#34;https://aws.amazon.com/redshift/&#34;&gt;Redshift&lt;/a&gt; (AWS Cloud Datawarehouse offering) which is based on &lt;a href=&#34;https://www.postgresql.org/&#34;&gt;Postgres&lt;/a&gt;. I’ll dive into some of the anti-patterns around &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html#r_CREATE_TABLE_NEW-parameters&#34;&gt;Temporary Tables&lt;/a&gt; I’ve seen abused during my work experience, why they don’t make sense, and how to fix them.&lt;/p&gt;
&lt;p&gt;Lets get it out of the way… Temporary Tables are not “better” than &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html&#34;&gt;Common Table Expressions&lt;/a&gt; (CTEs) and CTEs are not “better” than temporary tables. It is entirely dependent on the context, the data, the DBMS, the query, upstream tables, etc… &lt;strong&gt;because at the end of the day the stakeholder wants the 1MM row output in excel so they can make a pivot table&lt;/strong&gt;. They don’t care if it was procured via CTEs, TEMP Tables, or that you put another hole in the home-office dry wall.&lt;/p&gt;</description>
      <content>&lt;p&gt;The database management system (DBMS) I’ll reference today is &lt;a href=&#34;https://aws.amazon.com/redshift/&#34;&gt;Redshift&lt;/a&gt; (AWS Cloud Datawarehouse offering) which is based on &lt;a href=&#34;https://www.postgresql.org/&#34;&gt;Postgres&lt;/a&gt;. I’ll dive into some of the anti-patterns around &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html#r_CREATE_TABLE_NEW-parameters&#34;&gt;Temporary Tables&lt;/a&gt; I’ve seen abused during my work experience, why they don’t make sense, and how to fix them.&lt;/p&gt;
&lt;p&gt;Lets get it out of the way… Temporary Tables are not “better” than &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html&#34;&gt;Common Table Expressions&lt;/a&gt; (CTEs) and CTEs are not “better” than temporary tables. It is entirely dependent on the context, the data, the DBMS, the query, upstream tables, etc… &lt;strong&gt;because at the end of the day the stakeholder wants the 1MM row output in excel so they can make a pivot table&lt;/strong&gt;. They don’t care if it was procured via CTEs, TEMP Tables, or that you put another hole in the home-office dry wall.&lt;/p&gt;
&lt;p&gt;A few other notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;In Redshift (and many other DBMS for that matter) there are plenty of excellent use cases for Temp Tables (I&amp;rsquo;ll provide a few at the end)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Be wary of Premature Optimization. If analytics workloads are running in an acceptable amount of time and not incurring unacceptable expenses, then effort is probably better spent elsewhere&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&#34;read-write-read-write&#34;&gt;Read, Write, Read, Write:&lt;/h1&gt;
&lt;p&gt;Internally, most analytics queries we run scan TBs of data (&lt;code&gt;Redshift Spectrum costs $5 per TB of data scanned from S3&lt;/code&gt;, but hey… screw it SELECT * it’s a business expense, Right!, Right?). Most commonly, large Transform or Extract jobs are a series of Temporary tables, there is nothing &lt;del&gt;necessarily&lt;/del&gt; wrong with this. The issues come about when large chunks of unfiltered data are passed through this series of Temporary tables, being read and written over and over again.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;SELECT statements encased in a &lt;code&gt;CREATE TEMP TABLE AS&lt;/code&gt; will read data from the upstream source and write it to disk.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In a particular session these tables can be seen in STV_TBL_PERM, they will be collected after the session is closed (obviously permanent tables will not).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Repeatedly reading and writing data too and from temp tables throughout the course of a query can be costly both monetarily and temporally.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h5 id=&#34;issue-1-create-a-temp-table-just-to-select-everything-out-of-it&#34;&gt;Issue 1: Create a temp table just to SELECT everything out of it&lt;/h5&gt;
&lt;p&gt;Uselessly writing the entire result set to disk just to read it all back out. Just run the final SELECT, there is no need for the temp table. This also presents an additional opportunity for a statement to get stuck in the que behind other workloads.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- NOT GREAT, I SEE THIS WAY TOO OFTEN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; TEMP &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FINAL&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; EVERYTHING
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; SOME_SCHEMA.BIG_TABLE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FINAL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- BETTER
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; EVERYTHING
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; SOME_SCHEMA.BIG_TABLE
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h5 id=&#34;issue-2-unnecessary-data-and-losing-optimizations&#34;&gt;Issue 2: Unnecessary data and losing optimizations&lt;/h5&gt;
&lt;p&gt;Redshift is a columnar database. Each column is stored separately, meaning if you don’t need a column Redshift can ignore scanning the data entirely. &lt;code&gt;Yes, if you&#39;re using Spectrum for your OLAP queries and your data is sitting in S3 in abunch of .csv, Redshift will have to scan the whole file... but its probably time for&lt;/code&gt; &lt;strong&gt;&lt;a href=&#34;https://parquet.apache.org/&#34;&gt;Parquet&lt;/a&gt;?&lt;/strong&gt; In the below SQL:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Redshift is forced to read and store columns BT.COL4, BT.COL5, BT.COL6 , BT.COL7 in TEMP_1 even though they are never referenced again in the rest of the query.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The query optimizer can’t parse further down in the query that all that was wanted were some fairly specific predicates; it could have skipped many blocks of data. Each of the select statements is on its own island in a way. Redshift stores data in 1MB blocks, with additional metadata about these blocks. This meta data (a.k.a zone maps) is what allows the DBMS to avoid scanning irrelevant blocks.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; TEMP &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; TEMP_1 &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; BT.COL1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , BT.COL2
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , BT.COL3
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , BT.COL4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , BT.COL5
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , BT.COL6
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , BT.COL7
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; THIS_IS_A.BIG_TABLE BT
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-----------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- Now that we’ve read everything and written to disk lets read it back out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-----------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; TEMP &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; TEMP_2 &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; BT.COL1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , BT.COL2
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , BT.COL3
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            , ST.COL9
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; THIS_IS_A.BIG_TABLE BT
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#66d9ef&#34;&gt;INNER&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;JOIN&lt;/span&gt; TINY.SMALL_TABLE ST &lt;span style=&#34;color:#66d9ef&#34;&gt;ON&lt;/span&gt; BT.COL1 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ST.COL1 &lt;span style=&#34;color:#66d9ef&#34;&gt;AND&lt;/span&gt; BT.COL2 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ST.COL2
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- FINAL SELECT, lets read everything back out, after creating TEMP_2 for… no reason?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; TEMP_2
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;WHERE&lt;/span&gt; COL1 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;‘&lt;/span&gt;I Should Have&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;’&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;AND&lt;/span&gt; COL2 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;‘&lt;/span&gt;Applied these predicates&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;’&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;AND&lt;/span&gt; COL3 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;‘&lt;/span&gt;Earlier
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;SQL is a declarative language, you indicate to the DBMS what you want and the DBMS goes off and determines the best way to retrieve that for you. The series of TEMP TABLES above augments this process and confines the DBMS to only performing rudimentary optimizations without broader context.&lt;/p&gt;
&lt;p&gt;These optimizations can be implemented manually by the author of the query, but would otherwise be done automatically by the DBMS. &lt;code&gt;Its like paying your chauffer a bunch of money to drive you to the airport, but, instead, you wrestle him for the keys, tie him up in the back seat of the car, turn on Maroon 5 (You love Maroon 5), and drive to the airport yourself?&lt;/code&gt; Why do that?&lt;/p&gt;
&lt;h5 id=&#34;issue-3-sort-keys-multiple-sort-keys&#34;&gt;Issue 3: Sort Keys, Multiple Sort Keys???&lt;/h5&gt;
&lt;p&gt;Something that seems to be fairly common is defining a &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html&#34;&gt;SORT KEY&lt;/a&gt; within the CTAS syntax when the column used in the sort key should already have been filtered on in the first place during the creation of the Temp Table. This is essentially identical to my point above about filtering after the Temp Table Creation. Defining multiple sort keys at this point just makes things worse.&lt;/p&gt;
&lt;p&gt;The kicker is that Redshift will attempt to determine the optimal SORY KEY and &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html&#34;&gt;DIST KEY&lt;/a&gt; for any Temp Table based on the query plan. Manually specifying it in the temp table only to filter it out later is just a bit painful to see and flies in the face of aggregate early and often. More info can be found in the &lt;a href=&#34;https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-playbook-distribution-styles-and-distribution-keys/&#34;&gt;advanced table design playbook&lt;/a&gt;. This is probably a good time to mention that its important to make sure &lt;a href=&#34;https://docs.aws.amazon.com/redshift/latest/dg/r_ANALYZE.html&#34;&gt;table statistics&lt;/a&gt; are up to date.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- THE ADDITIONAL SORTING MAKES THE WRITE MORE EXPENSIVE WITH NO DOWNSTREAM BENEFIT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;-- WE&amp;#39;VE DISTRIBUTED THE DATA ACROSS THE NODES IN THE CLUSTER USING COL4
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;CREATE&lt;/span&gt; TEMP &lt;span style=&#34;color:#66d9ef&#34;&gt;TABLE&lt;/span&gt; TEMP_1 SORTKEY(COL1, COL2, COL3) DISTKEY(COL4) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; BT.COL1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , BT.COL2
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , BT.COL3
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , BT.COL4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , BT.COL5
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , BT.COL6
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , BT.COL7
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;…&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; THIS_IS_A.BIG_TABLE BT
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; BT.&lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; TEMP_1_SORT BTA
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#66d9ef&#34;&gt;INNER&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;JOIN&lt;/span&gt; ANOTHER_TABLE ATA &lt;span style=&#34;color:#66d9ef&#34;&gt;ON&lt;/span&gt; BTA.COL4 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ATA.COL4 &lt;span style=&#34;color:#75715e&#34;&gt;-- AT LEAST WE COLLOCATED THE JOIN COL...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;WHERE&lt;/span&gt; BTA.COL1 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;‘&lt;/span&gt;Yikes&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;’&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;-- SHOULD HAVE FILTERED ABOVE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In regard to distributing on COL4 someone may think:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;“I want to save data in a Temp Table and specify a distribution key, so that I can improve my JOIN later on”&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is certainly some thought going into this argument, but there is no free lunch.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The redistributed data may improve the later join by collocating data across the cluster&amp;rsquo;s nodes, however, all that shuffling across different nodes still had to occur when the temp table was created!&lt;/li&gt;
&lt;li&gt;While this is a valid argument, it would be important to benchmark the run times to actually determine what is most optimal here.&lt;/li&gt;
&lt;li&gt;If the current query patterns don’t utilize the table’s partitions (sort keys) and rarely join on the column its distributed on… it might be time to think about adjusting the upstream table rather than having all subsequent queries cache the table in a temp table just to sort and redistribute it before further processing.&lt;/li&gt;
&lt;/ul&gt;
&lt;h5 id=&#34;when-are-temp-tables-a-good-choice&#34;&gt;When are temp tables a good choice?&lt;/h5&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If the temporary table is going to be read from numerous times, it may be best to filter once and read several times from the aggregated data. This was especially handy when Redshift used to lack &lt;a href=&#34;https://aws.amazon.com/about-aws/whats-new/2023/02/amazon-redshift-rollup-cube-grouping-sets-group-by-clause/&#34;&gt;ROLLUP, CUBE, and GROUPING SETS&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Using Temp Tables to chunk up pieces of a query can be very handy in development, as they are persisted for the duration of the session, things like tests, logical checks, and quick aggregations can be run against them&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Using a series of CTEs within a temp table can be a happy medium, doing more work and writing out less often&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Honestly, I&amp;rsquo;d take 100 temp tables over having to parse through some nested sub query mess&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;WHERE&lt;/span&gt; (COL1 &lt;span style=&#34;color:#66d9ef&#34;&gt;IN&lt;/span&gt; (&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content>
    </item>
    
  </channel>
</rss>
