<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Projects on Sam&#39;s Blog</title>
    <link>/projects/</link>
    <description>Recent content in Projects on Sam&#39;s Blog</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <lastBuildDate>Fri, 22 Aug 2025 00:00:00 +0000</lastBuildDate><atom:link href="/projects/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Robinhood Gambling</title>
      <link>/projects/robinhood_gambling/</link>
      <pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate>
      
      <guid>/projects/robinhood_gambling/</guid>
      <description>&lt;h2 id=&#34;the-robinhood-gambling-factory&#34;&gt;The Robinhood gambling factory&lt;/h2&gt;
&lt;p&gt;Robinhood has a nice mobile experience, they have good products/deals, and they also have a &lt;a href=&#34;https://robinhood.com/creditcard/&#34;&gt;credit card&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There has been a lot of &lt;a href=&#34;https://buffett.cnbc.com/video/2021/05/03/robinhood-takes-advantage-of-gambling-instincts.html&#34;&gt;talk&lt;/a&gt; about Robinhood engaging with human &amp;ldquo;gambling instincts&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Just search &amp;ldquo;how has Robinhood gamified investing&amp;rdquo; and you can watch &lt;a href=&#34;https://www.cnbc.com/2021/03/31/robinhood-gets-rid-of-confetti-feature-amid-scrutiny-over-gamification.html&#34;&gt;confetti&lt;/a&gt; rain down as someone yolos their life savings into &lt;a href=&#34;https://www.reddit.com/r/options/comments/1dayi15/i_lost_695_on_gme_call_0dte/&#34;&gt;GME 0 DTE calls&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Again, I&amp;rsquo;m not saying the product is all bad, but there is one piece which is astonishingly ridiculous to me.&lt;/p&gt;</description>
      <content>&lt;h2 id=&#34;the-robinhood-gambling-factory&#34;&gt;The Robinhood gambling factory&lt;/h2&gt;
&lt;p&gt;Robinhood has a nice mobile experience, they have good products/deals, and they also have a &lt;a href=&#34;https://robinhood.com/creditcard/&#34;&gt;credit card&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There has been a lot of &lt;a href=&#34;https://buffett.cnbc.com/video/2021/05/03/robinhood-takes-advantage-of-gambling-instincts.html&#34;&gt;talk&lt;/a&gt; about Robinhood engaging with human &amp;ldquo;gambling instincts&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Just search &amp;ldquo;how has Robinhood gamified investing&amp;rdquo; and you can watch &lt;a href=&#34;https://www.cnbc.com/2021/03/31/robinhood-gets-rid-of-confetti-feature-amid-scrutiny-over-gamification.html&#34;&gt;confetti&lt;/a&gt; rain down as someone yolos their life savings into &lt;a href=&#34;https://www.reddit.com/r/options/comments/1dayi15/i_lost_695_on_gme_call_0dte/&#34;&gt;GME 0 DTE calls&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Again, I&amp;rsquo;m not saying the product is all bad, but there is one piece which is astonishingly ridiculous to me.&lt;/p&gt;
&lt;h2 id=&#34;the-credit-card&#34;&gt;The Credit Card&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m not going to write about the Casino-like aspects of the brokerage product&amp;hellip; but there is also the Robinhood CREDIT CARD!&lt;/p&gt;
&lt;p&gt;I opened the card a few months ago, payed the $50 Gold annual subscription, applied, and got the card. The card is kind of nice&amp;hellip; the gold is a little gimmicky, but the 3% back on everything and 5% on travel is good for someone that doesn&amp;rsquo;t want to swap cards around for each kind of purchase.&lt;/p&gt;
&lt;p&gt;I can just nail down 3%, convert those points to cash that lands straight back into Robinhood, and earns 4% APY. Am I leaving a few bps on the table&amp;hellip; sure, but everything is contained in the same ecosystem, transfers are fast, and 4% APY on my uninvested cash is fine.&lt;/p&gt;
&lt;h3 id=&#34;the-mystery-box&#34;&gt;The Mystery Box&lt;/h3&gt;
&lt;p&gt;For the first time, I go to exchange my points for cash, 1 point = 1 cent. I&amp;rsquo;m not going to fuss around with gift cards/shopping stuff, I want my cash, and I want to go make 4% on it.&lt;/p&gt;
&lt;p&gt;After opening up the rewards tab, I&amp;rsquo;m forced to scroll ALL THE WAY TO THE BOTTOM to get to the &amp;ldquo;Cash Back&amp;rdquo; section. I&amp;rsquo;m scrolling through shop with points, travel with points, buy Robinhood merch with points and &lt;strong&gt;Low and Behold&lt;/strong&gt; I see &amp;ldquo;The Mystery Box&amp;rdquo;&amp;hellip; what the fuck is &amp;ldquo;The Mystery Box&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Well, I&amp;rsquo;m glad you asked, and you can retrieve the full details &lt;a href=&#34;https://api.robinhood.com/creditcard/legal/reward-terms&#34;&gt;here&lt;/a&gt;&amp;hellip; but, in short, the Mystery box is LITERALLY an online casino.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s right, You can shuttle 1000 points (~$10) into this thing and come out with&amp;hellip; less than $10! Amazing, lets break it down.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Robinhood.png&#34; alt=&#34;Gamble Gamble&#34;&gt;&lt;/p&gt;
&lt;p&gt;A few things to start, the odds do sum to 100%, so you&amp;rsquo;re guaranteed to win one of these things. Each win is independent and the odds don&amp;rsquo;t change as far as I can tell. So lets look at expected value:&lt;/p&gt;
&lt;h4 id=&#34;cash-prizes&#34;&gt;&amp;ldquo;Cash&amp;rdquo; Prizes:&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;10,000 Points @ 0.50%: 10,000 Points = ~$100, $100 * 0.50% = 50 cents&lt;/li&gt;
&lt;li&gt;7,500 Points	@ 0.75%: 7,500 Points = ~$75, $75 * 0.75% = 56.25 cents&lt;/li&gt;
&lt;li&gt;5,000 Points	@ 1.50%: 5,000 Points = ~$50, $50 * 1.50% = 75 cents&lt;/li&gt;
&lt;li&gt;2,500 Points	@ 2.50%: 2,500 Points = ~$25, $25 * 2.50% = 62.5 cents&lt;/li&gt;
&lt;li&gt;1,500 Points	@ 4.00%: 1,500 Points = ~$15, $15 * 4.00% = 60 cents&lt;/li&gt;
&lt;li&gt;1,250 Points	@ 5.00%: 1,250 Points = ~$12.50, $12.50 * 5.00% = 62.5 cents&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;
&lt;p&gt;Total EV: $.50 + $.5625 + $.75 + $62.5 + $.60 + $.625 = $3.6625&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Hmmm&amp;hellip; I hope we catch up a little bit on the cashback prizes.&lt;/p&gt;
&lt;h4 id=&#34;cashback-prizes&#34;&gt;Cashback Prizes:&lt;/h4&gt;
&lt;p&gt;A few notes on cashback:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We must factor in the fact that we would have earned 3% on these purchases anyway, thus, we have to take cashback% minus 3% to calculate the true net benefit&lt;/li&gt;
&lt;li&gt;To maximize the value you must spend up to the cap. This means you must WANT to purchase from where you win and be LIQUID/WILLING enough to make said purchase e.g. spend $3500 in the next 24 hours.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Assumptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I&amp;rsquo;m going to assume the opportunity cost of capital is $0, this is especially generous as the $500 you spend at Target could be used for investing in 0 DTE options on&amp;hellip; Robinhood!&amp;hellip;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For prizes with no Cap, I&amp;rsquo;m going to assume average cost (if I can find it) + 25%, otherwise just going to insert a reasonable guess&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For prizes with a Cap, I&amp;rsquo;m going to assume the reward is maximized&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I&amp;rsquo;m going to assume the above conditions hold for &lt;strong&gt;33% of the items below&lt;/strong&gt;. This is a big assumption, but it is fair given the variety of options and the previous conditions. This means we&amp;rsquo;ll cut our EV calculation for this section in 1/3 when it&amp;rsquo;s all said and done.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;Disagree with my assumptions? I &lt;a href=&#34;https://github.com/sschnei8/mystery_box&#34;&gt;simulated the whole thing&lt;/a&gt; so you can test for yourself and see if it&amp;rsquo;s worth it for you. This is actually ripping the rewards straight from api.robinhood.com (and doing so poorly), so it will inevitably break at some point, but it&amp;rsquo;s fun for now!&lt;/p&gt;&lt;/blockquote&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;10% cashback on next grocery purchase ($500 cap) @ 5.00%: $500 * (10-3)% = $35 * 5% = &lt;strong&gt;$1.75&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;10% cashback on next automotive gas purchase ($300 cap) @ 6.00%: $300 * (10-3)% = $21 * 6% = &lt;strong&gt;$1.26&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5% cashback for 24 hours ($3,500 cap) @ 4.00%: $3500 * (5-3)% = $70 * 4% = &lt;strong&gt;$2.8&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;10% cashback on next Uber or Lyft ($300 cap) @ 7.00%: $300 * (10-3)% = $21 * 7% = &lt;strong&gt;$1.47&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;15% cashback on next Starbucks purchase @ 8.00%: $25 * (15-3)% = $3 * 8% = &lt;strong&gt;$.24&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;4% cashback for 24 hours ($3,000 cap) @ 4.50%: $3000 * (4-3)% = $30 * 4.5% = &lt;strong&gt;$1.35&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5% cashback on next Amazon.com purchase ($500 cap) @ 8.00%: $500 * (5-3)% = $10 * 8% = &lt;strong&gt;$.80&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5% cashback on next restaurant order ($500 cap) @ 8.00%: $500 * (5-3)% = $10 * 8% = &lt;strong&gt;$.80&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5% cashback on next DoorDash or Uber Eats ($300 cap) @ 8.00%: $300 * (5-3)% = $6 * 8% = &lt;strong&gt;$.48&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5% cashback on next Target purchase ($500 cap) @ 6.00%: $500 * (5-3)% = $10 * 6% = &lt;strong&gt;$.60&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5% cashback on your next phone bill @ 6.00%: (&lt;a href=&#34;https://www.t-mobile.com/dialed-in/wireless/average-phone-bill-per-month&#34;&gt;$141&lt;/a&gt; * 1.25) * (5-3)% = $3.525 * 6% = &lt;strong&gt;$.2115&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5% cashback on your next Lululemon, Alo Yoga or Vuori purchase @ 8.00%: $300 * (5-3)% = $6 * 8% = &lt;strong&gt;$.48&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;4% cashback on next spa purchase @ 7.00%: $300 * (4-3)% = $3 * 7% = &lt;strong&gt;$.21&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;5% cashback for 30 days ($10,000 cap) @ 0.25%: $10000 * (5-3)% = $200 * .25% = &lt;strong&gt;$.5&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Starbucks, Lululemon, Alo Yoga or Vuori, and Spa purchases I think are fair estimates. The phone bill at least has a source. Feel free to throw some different estimates &lt;a href=&#34;https://github.com/sschnei8/mystery_box&#34;&gt;in there&lt;/a&gt; and test it out for yourself.&lt;/p&gt;
&lt;p&gt;To find the total EV, we will sum the individual values:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$1.75 (Grocery)
$1.26 (Gas)
$2.80 (5% for 24h)
$1.47 (Uber/Lyft)
$0.24 (Starbucks)
$1.35 (4% for 24h)
$0.80 (Amazon)
$0.80 (Restaurant)
$0.48 (DoorDash/Uber Eats)
$0.60 (Target)
$0.2115 (Phone Bill)
$0.48 (Lululemon etc.)
$0.21 (Spa)
$0.50 (5% for 30 days)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And get: $12.9515 which we can divide by 3 to get $4.3172.&lt;/p&gt;
&lt;p&gt;Adding this to our cash prizes, we expect an EV of $3.6625 + $4.3172 = $7.9797!&lt;/p&gt;
&lt;p&gt;$10 in&amp;hellip; $7.9797 out. You can play around with the assumptions but unless you&amp;rsquo;re ready to maximize every single one of the Cashback prizes&amp;hellip; every single time you roll the dice on this Mystery Box you&amp;rsquo;re handing $2 back to Robinhood in the form of them not having to pay out the true value on their 3% rewards program.&lt;/p&gt;
&lt;p&gt;This is the definition of a Casino. Robinhood will argue that the program is more than fair as it&amp;rsquo;s positive EV for someone willing to maximize every single one of these rewards. We can go around in circles about this, but if you land on the &lt;strong&gt;5% cashback on next DoorDash or Uber Eats ($300 cap) @ 8.00%&lt;/strong&gt; the maximum value one can accrue is .02*300 = $6&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;funny-thoughts-this-is-not-legal-or-financial-advice&#34;&gt;Funny Thoughts (This is not legal or financial advice)&lt;/h2&gt;
&lt;p&gt;Ok so there are three uncapped offerings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;4% cashback on next spa purchase&lt;/li&gt;
&lt;li&gt;15% cashback on next Starbucks purchase&lt;/li&gt;
&lt;li&gt;5% cashback on your next Lululemon, Alo Yoga or Vuori purchase&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now if you haven&amp;rsquo;t head of Konstantin Anikeev let me &lt;a href=&#34;https://nstp.org/memberarea/federaltaxalert/2021/konstantin-anikeev-v-commissioner-tc-memo-2021-23&#34;&gt;introduce you&lt;/a&gt;. Konstantin curated this beauty:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Purchasing Gift Cards: Using his American Express 5% &amp;lsquo;cash back&amp;rsquo; Rewards Program, he Purchased Visa gift cards at grocery stores or pharmacies where the 5%Rewards applied.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Although Visa cards cannot be converted to cash, they can be used to purchase Money Orders. He would purchase the Money Orders with the Visa gift cards.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deposit the Money Orders to his Bank Account.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pay the Amex Balance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Repeat to the tune of $6.4MM in purchases and $320,000 in &amp;lsquo;profits&amp;rsquo; by converting has 5% cashback in excess of fees&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s also an interesting tax ruling in Konstantin Anikeev v. Commissioner, but this should now draw your eyes to that uncapped 15% starbucks purchase. The 5% and 4% uncapped offerings are interesting, but 15%&amp;hellip;&lt;/p&gt;
&lt;p&gt;Lets say we call up Starbucks and tell them we have a $20,000 limit on a card paying 15% on our next purchase, how could they turn down this opportunity. Better yet, what if Starbucks wanted to outfit a few stores in Lululemon and Lululemon wanted to buy starbucks for its employees&amp;hellip; we could facilitate this trade, in which case I&amp;rsquo;d be hitting the mystery box like crazy.&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>Arbitrage</title>
      <link>/projects/arbitrage/</link>
      <pubDate>Sat, 14 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>/projects/arbitrage/</guid>
      <description>&lt;h2 id=&#34;winning-every-time&#34;&gt;Winning Every Time&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m going to summarize arbitrage sports betting. I think I’m going to add this to “projects” although the artifacts/results of my time spent in this space aren’t really relevant to this post. Also anytime I reference myself, me, or I in this post I&amp;rsquo;m actually talking about &lt;a href=&#34;https://en.wikipedia.org/wiki/Jonathan_%28tortoise%29&#34;&gt;Jonathan&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Is that click bait? I’d argue its not because its true, and I don’t make any money or care who reads this so… no I don’t think so.&lt;/p&gt;</description>
      <content>&lt;h2 id=&#34;winning-every-time&#34;&gt;Winning Every Time&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m going to summarize arbitrage sports betting. I think I’m going to add this to “projects” although the artifacts/results of my time spent in this space aren’t really relevant to this post. Also anytime I reference myself, me, or I in this post I&amp;rsquo;m actually talking about &lt;a href=&#34;https://en.wikipedia.org/wiki/Jonathan_%28tortoise%29&#34;&gt;Jonathan&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Is that click bait? I’d argue its not because its true, and I don’t make any money or care who reads this so… no I don’t think so.&lt;/p&gt;
&lt;h2 id=&#34;the-scene&#34;&gt;The scene&lt;/h2&gt;
&lt;p&gt;In 2021 or 2022 as sports books were rolling out across the U.S. they all had large promotions “Bet $5 get $200 in free bets” kind of thing. This was essentially just free money as there are many &lt;strong&gt;&lt;a href=&#34;https://oddsjam.com/betting-calculators/free-bet-conversion&#34;&gt;places&lt;/a&gt;&lt;/strong&gt; that will freely show you how to optimally liquidate your free bets into cash. If you read through that link you’ll see that what you use is a hedge against your free bet… so&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;if your free bet &lt;strong&gt;wins&lt;/strong&gt; you get a nice payout minus your hedge.&lt;/li&gt;
&lt;li&gt;if your free bet &lt;strong&gt;looses&lt;/strong&gt; you get a nice payout from your hedge minus nothing… the bet was free… or the cost of becoming eligible to receive the promotion.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is all fun and stuff, but I quickly liquidated all the cash out of these opportunities and was left with a crippling addiction… just kidding… but gambling addiction is no joke and extremely &lt;strong&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=HEB0kla2AUM&#34;&gt;sad&lt;/a&gt;&lt;/strong&gt;. I was curious to know if sports books ever mis-priced lines, how lines were created, or if you could pit two sports books against each other…&lt;/p&gt;
&lt;h2 id=&#34;background-on-the-books&#34;&gt;Background on the books&lt;/h2&gt;
&lt;p&gt;Online Sports books are no better than playing roulette at a brick and mortar casino. They play on human biases and tendencies toward lottery like payoffs to extract value from players over time. Those who consistently win are banned or limited as value extraction is a one way street.&lt;/p&gt;
&lt;p&gt;Sportsbooks typically have well informed lines. A betting line refers to the odds that a sportsbook lists for a wager. There are different ways to display &lt;a href=&#34;https://www.investopedia.com/articles/investing/042115/betting-basics-fractional-decimal-american-moneyline-odds.asp&#34;&gt;odds&lt;/a&gt;. I’m going to specifically talk in American odds just because that’s what I’m used to, not because they are superior in anyway (I think most professionals use decimal odds, but I could be wrong).&lt;/p&gt;
&lt;p&gt;In American odds, the favorite is shown with a (-) and denotes &lt;code&gt;the amount of $ you’d have to bet to win $100&lt;/code&gt;, if the odds are -400 you’d have to bet $400 just to win $100 meaning that “thing” you’re betting on is likely to happen. If the odds are +400 we are betting on an underdog, in this case, &lt;code&gt;if we bet $100 we’d win $400&lt;/code&gt;. We can also determine the implied odds underlying the odds we are being shown.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;+400 could be re-written to note the we are betting $100 to win $500 (We get our $100 bet back + $400 in winnings) -&amp;gt; 100/500 or &lt;strong&gt;20% implied odds&lt;/strong&gt;. For our favorite at -400 we are betting $400 to win 500$ (We get our $400 bet back + $100 in winnings) -&amp;gt; 400/500 or &lt;strong&gt;80% implied odds&lt;/strong&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;These lines are constantly updated with the latest information and these books are also keenly aware of the action that is occurring on a particular line. I won’t go into detail here but this &lt;a href=&#34;https://www.boydsbets.com/sports-betting-handle/&#34;&gt;article&lt;/a&gt; has info on how books manage risk on specific lines.&lt;/p&gt;
&lt;p&gt;So are we just going to look for lines that are “off”: This team should only be a +200 underdog but the book has them at +500? Given this information above, we are unlikely to catch sports books utterly mispricing a particular line… but maybe we can catch them being just so slightly off. We also have to remember that because these books are just extracting money from people there is a lot of competition to be an extractor. What I mean is that there are a lot of ways to gamble on sports, even Robinhood has begun adding basic predictions markets on sporting events. When all of these different places offers odds on an outcome; sometimes, they begin to disagree with one another. This is what we can take advantage of.&lt;/p&gt;
&lt;h2 id=&#34;taking-both-sides&#34;&gt;Taking both sides&lt;/h2&gt;
&lt;p&gt;What do I mean by “pit books against each other”. To explain this I want to walk through how these books function.&lt;/p&gt;
&lt;p&gt;We are going to be a sports book for a second. Lets say sports team A is playing sports team B. At our book we think this is going to be a close match so we offer team A at +110 (slight underdog) and team B at -110 (slight favorite). I play the scenarios out in the below drawing.&lt;/p&gt;
&lt;p&gt;Two players one on either side of the line place bets so that their total payout is equivalent ($210):&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Even_Odds.png&#34; alt=&#34;Even Odds&#34;&gt;&lt;/p&gt;
&lt;p&gt;You can see our sports book makes no money regardless of the out come (It’s pot of $ generated by the players must be payed out in full to satisfy the bets regardless of the outcome), its just facilitating transactions, and it’ll be in the red after it has to pay its variable/fixed costs. This is why you’d never see these odds at a sports book outside of gimmicks like the Super Bowl coin toss, promos, etc… You’ll see lines like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Book_Wins.png&#34; alt=&#34;Book Wins&#34;&gt;&lt;/p&gt;
&lt;p&gt;What changed? The odds on Team B went from &lt;code&gt;-110 to -130&lt;/code&gt; (It became more of a favorite, a.k.a the sports book pays out less money on the same amount wagered). Previously, player 2 only needed to bet $110 to win $100 now they have to bet $130 to win $100. I added the implied odds of the specific outcome in the first diagram with a 47.62% chance our underdog wins and a 52.38% chance our favorite wins. These sum to 100%.&lt;/p&gt;
&lt;p&gt;Now lets check out the implied odds in the +110 / -130 scenario… &lt;code&gt;47.62% and 56.52% respectively… these don’t sum to 100% anymore&lt;/code&gt;, now we get (104.14%) uh oh. NOW our book is making money:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;if team A wins the books makes $20&lt;/li&gt;
&lt;li&gt;if team B wins the book breaks even.&lt;/li&gt;
&lt;li&gt;We calc the EV here assuming the true odds are -110/+110 as (.4762 * $20) + (.5238 * $0) = &lt;code&gt;$9.52&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Wow so our book has an EV of $9.52 every time two people put on this bet! And looking at little further if we divide our EV by the Pot of money: $9.52/$230 we get ~4% remember our new totaled implied odds 104.14%… 104% - 100% = 4% this is the books “hold”. Hold in sports betting refers to the percentage of money that sportsbooks keep for every dollar wagered. The percentage will vary by wager type. Sportsbooks can’t just maximize hold because they have… competition. If I can get +130 in book A where as book B offers +150 on the same bet… you’re going to bet with book B every time… sports books need volume they need your bets to make money.&lt;/p&gt;
&lt;p&gt;And this is what these books want, they want as high volume of these transaction as they can get. That’s why they put on promotions and offer free bets and create Ads with tigers and famous people etc…&lt;/p&gt;
&lt;p&gt;But do all books have the same odds? No. Just as books create lines that ensure they win, we can look for favorable lines as well. These books don’t exist in a vacuum they compete against each other for a pool of betters, many of which are sensitive to how great of a hold a book has on a particular line. As sports books update their lines, mis-price things, or disagree with each other we can take advantage. Lets look at this below.&lt;/p&gt;
&lt;p&gt;In this scenario the same player is taking both sides of a line across two sports books. &lt;strong&gt;Sportsbook 1&lt;/strong&gt; offers +110 while &lt;strong&gt;sportsbook 2&lt;/strong&gt; offers the other side at -105. Lets look at what happens.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/We_Win.png&#34; alt=&#34;Wins&#34;&gt;&lt;/p&gt;
&lt;p&gt;Bam and now we can see that no matter the outcome of this game, player 1 makes $5 or comes out even. &lt;code&gt;Summing the implied odds 47.62% + 51.22% = 98.84% we have engineered an inverse hold (1.16%)&lt;/code&gt;, where the player always comes out on top regardless of the outcome. Our EV on this bet, assuming true odds of +110/-110 is (47.62% * $5) + (52.38%* $0) = &lt;code&gt;$2.38&lt;/code&gt; and don’t forget $2.38/$205 = 1.16% (Our inverse hold).&lt;/p&gt;
&lt;p&gt;Wow so if I have $205 I can instantly make 1.16% on it if I can find these lines?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Yes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And if you take that same $205 and use it again and again every night for a month I could make even more $?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Yes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And if I increased my bet sizes to $10,000 could I scale up the magnitude of my profit?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Yes, in our contrived +110/-105 example if we bet $10,000 on +110 and $10,500 on -105 we’d have an EV of (47.62% * $500) + (52.38%* $0) = $238… I like $238 better than $2.38.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What if we found something even better like +110/+110?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Yes that would be nice.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;so-whats-the-catch&#34;&gt;So what’s the catch?&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Books aren’t dumb, they will see you are consistently hitting soft/slow lines.&lt;/code&gt; Lines with low hold or lines that weren’t updated fast enough to avoid mis-pricing across other books. They will ban you or limit you to only being allowed to bet a few dollars at a time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;It’s harder than you think to find these odds. Odds scraping is difficult. Scaling is hard.&lt;/code&gt; If you buy odds… that’s going to cost you $, and they are slightly delayed. You are going to struggle to place bets via an api, most books don’t have one, so you’re going to have to build something to place these bets via the UI or manually do it (Doesn’t scale well).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Odds can change as you take both sides&lt;/code&gt; as there will inherently be some latency between the two bets. If they change in the wrong direction and you already have exposure to one side… now you’re just trying to limit your losses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;I said before that books will limit you, they can also void your bet.&lt;/code&gt; That’s right, if the sportsbook realizes it gave you too good of odds it will just void the bet. &lt;strong&gt;Bad analogy alert:&lt;/strong&gt; Imagine you open a 12 month CD at 4.5%, sign the paper, add the money done… you wake up the next day and the CD isn’t there any more. You call the bank… “Yeah umm we actually don’t want to give you 4.5% so we closed the account… how about 3%”. Sports books can and do just that… remember you are on their platform and this stuff is relatively “new” they get to do what they want (to a point) with little recourse. Its like if you build a YouTube channel and they strike down your video… you can try and appeal but at the end of the day you’re on their platform and play by their rules wether or not your video was justifiably taken down. And who is to determine justifiably? I don&amp;rsquo;t know that’s a topic for another day.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Books will ban you.&lt;/code&gt; Internally it’ll be called the fraud detection team or money laundering blah blah blah, and I’m sure they do a good amount of that… but in reality its the smart money detection team “don’t let us lose money at all costs if someone wants to win they can do it elsewhere.”&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;In the United States there is this thing called death, and there’s this other thing called taxes…&lt;/code&gt; both tend to happen to people. Speaking of taxes, there is another thing called taking the standard deduction, and there’s other things about qualifying as a “professional gambler”. Now what do all these “things” mean? I don’t know, I am not an expert but you should &lt;strong&gt;READ UP ON THIS&lt;/strong&gt; before you start your lucrative arbing at scale.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Side note, Arbitrage exists in other markets as well&lt;/code&gt;, look at the &lt;a href=&#34;https://finance.yahoo.com/news/sam-bankman-fried-explains-arbitrage-132901181.html&#34;&gt;trades&lt;/a&gt; SBF was putting on back in the day.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Here SBF is taking advantage of price differences between markets (buy low, sell high), his major challenges were essentially bridging the two markets. If people were willing to sell BTC for $10,000 and others willing to buy it for $15,000 in a single &lt;strong&gt;&lt;a href=&#34;https://www.investopedia.com/terms/m/marketefficiency.asp&#34;&gt;efficient market&lt;/a&gt;&lt;/strong&gt; the prices would converge and the opportunity would be quickly eaten up. But because there was a lack of information, hurdles to cross in putting on the trade, and crypto being the wild west he was able to capitalize on this opportunity.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So if we could overcome all the issues above (I’m sure some people do) we could generate a nice profit. But it is NOT easy to do, NOT easy to remain undetected, and NOT something you could accomplish over night. You WILL need $ to get started, you WILL need to adapt to changes and you WILL run into issues as you scale.&lt;/p&gt;
&lt;p&gt;Good luck.&lt;/p&gt;
</content>
    </item>
    
    <item>
      <title>NY Times Connections Solver PT. 1</title>
      <link>/projects/connections/</link>
      <pubDate>Sun, 16 Jun 2024 00:00:00 +0000</pubDate>
      
      <guid>/projects/connections/</guid>
      <description>&lt;h1 id=&#34;olive-oil&#34;&gt;Olive Oil:&lt;/h1&gt;
&lt;p&gt;During my daily morning ritual of waking up at 4:30 AM and &lt;code&gt;immediately ingesting&lt;/code&gt; a ½ cup of [Trader Joes Extra Virgin Spanish Olive Oil] (Sponsored link), I decided I’d like to try and solve the &lt;a href=&#34;https://www.nytimes.com/games/connections&#34;&gt;NY Time’s Connections&lt;/a&gt;. So, with greasy fingers, lets begin.&lt;/p&gt;
&lt;p&gt;This is part one, where I:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Review the rules of Connections&lt;/li&gt;
&lt;li&gt;Gather historical Connections games into a structured dataset&lt;/li&gt;
&lt;li&gt;Come up with some strategy for a program that’s somewhat capable of playing the game well&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In part two, I’ll try and generate a solver for the game. I am not an expert on NLP, nor do I know much about word embeddings, nor do I know much about word embeddings given context, but I’m quite curious to learn. Here in part one, I’m going to nail out something semi-interesting and treat part two as a learning experience.&lt;/p&gt;</description>
      <content>&lt;h1 id=&#34;olive-oil&#34;&gt;Olive Oil:&lt;/h1&gt;
&lt;p&gt;During my daily morning ritual of waking up at 4:30 AM and &lt;code&gt;immediately ingesting&lt;/code&gt; a ½ cup of [Trader Joes Extra Virgin Spanish Olive Oil] (Sponsored link), I decided I’d like to try and solve the &lt;a href=&#34;https://www.nytimes.com/games/connections&#34;&gt;NY Time’s Connections&lt;/a&gt;. So, with greasy fingers, lets begin.&lt;/p&gt;
&lt;p&gt;This is part one, where I:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Review the rules of Connections&lt;/li&gt;
&lt;li&gt;Gather historical Connections games into a structured dataset&lt;/li&gt;
&lt;li&gt;Come up with some strategy for a program that’s somewhat capable of playing the game well&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In part two, I’ll try and generate a solver for the game. I am not an expert on NLP, nor do I know much about word embeddings, nor do I know much about word embeddings given context, but I’m quite curious to learn. Here in part one, I’m going to nail out something semi-interesting and treat part two as a learning experience.&lt;/p&gt;
&lt;h1 id=&#34;overviewdefinitions&#34;&gt;Overview/Definitions:&lt;/h1&gt;
&lt;p&gt;Connections is one of many NY Times games. It began in June 2023 and releases a puzzle daily. Outside of drinking straight oil, it’s one of my favorite things to do in the morning… so why not ruin it by figuring out a way to convert all historical games into a table and write SQL against it!&lt;/p&gt;
&lt;p&gt;Connections is quite simple. You’re provided a 4x4 grid of word tiles. Words are grouped into sets of four based on a connecting category of varying difficulty. I’ll write about some patterns I discovered in the categories later.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Connections_min.png&#34; alt=&#34;Connections&#34;&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Above is a picture containing the Connections game from 6/05/2024. Funny enough, this particular game so-happens to contain the most common word used across the history of connections which is &lt;code&gt;Ring&lt;/code&gt; used, now, 11 times.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;In this example, the four words “Sick”, “Nice”, “Cool”, and “Sweet” can be used to form the “Awesome!” category. This category was designated the “easiest” of all categories, although this definition differs from person to person as not everyone would associate Sick as a synonym for “Awesome!”. Personally, I associate sick with what happens after I drink my ½ cup of oil which, in my opinion, is pretty awesome… so, for me, it makes sense, but you do you. Rinse and repeat with the remaining 12 words, and that’s Connections.&lt;/p&gt;
&lt;p&gt;Defining a few things to stay on the same page moving forward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Category: A grouping of 4 words in a game&lt;/li&gt;
&lt;li&gt;Words: A single tile in a game&lt;/li&gt;
&lt;li&gt;Difficulty: How “Straightforward (easy)” or “Tricky (hard)” a category was in a game.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&#34;how-does-chatgpt-4o-do&#34;&gt;How does ChatGPT-4o Do?&lt;/h1&gt;
&lt;p&gt;I’ll start with a single days example from June 8th , 2024. This example aligns with a status quo Connections game, you’ll see why I say this in the analysis section.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;**SPOILER ALERT**&lt;/code&gt; Below is the solution to the puzzle with the difficulty, category and words.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;[Easiest] Grab One’s Attention: ABSORB, ENGROSS, HOLD, RIVET&lt;/li&gt;
&lt;li&gt;[Easy] Document with Video: FILM, RECORD, SHOOT, TAPE&lt;/li&gt;
&lt;li&gt;[Medium] ANAGRAMS: PASTEL, PETALS, PLEATS, STAPLE&lt;/li&gt;
&lt;li&gt;[Hard] PRETTY ___: GOOD, PENNY, PLEASE, WOMAN&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I made sure that the model is familiar with the game; however, at this point, I don’t offer it any previous connections data or answers to go off of.&lt;/p&gt;
&lt;h6 id=&#34;prompt&#34;&gt;Prompt:&lt;/h6&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;Solve today’s game, providing the category defining each word grouping.
Words: Pleats, Please, Hold, Record, Woman, Film, Good, Pastel, Penny,
Staple, Tape, Rivet, Absorb, Petals, Engross, Shoot.&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;h6 id=&#34;response&#34;&gt;Response:&lt;/h6&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Words related to movies:&lt;/strong&gt; Shoot, Film, Tape, Record &lt;code&gt;[CORRECT]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Words related to types of fasteners:&lt;/strong&gt; Rivet, Staple, Pleats, Hold &lt;code&gt;[WRONG]&lt;/code&gt;, but I also considered this possibility&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Words related to artistic or aesthetic qualities:&lt;/strong&gt; Pastel, Penny, Woman, Good &lt;code&gt;[WRONG]&lt;/code&gt; but close&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Words related to engagement or involvement:&lt;/strong&gt; Absorb, Please, Petals, Engross &lt;code&gt;[WRONG]&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When provided limited context or historical data, the LLM typically gets the two easier categories right (some cluster of closely related words). However, without any additional context, the model tends to miss the harder categories sometimes even letting mistakes spill back into the easier ones causing additional misses. This isn’t news to anybody, but illustrates that better prompts get better responses… if you’re going to go Oscar the Grouch mode (his birthday was June 1 BTW) expect a garbage in -&amp;gt; garbage out scenario.&lt;/p&gt;
&lt;p&gt;Let’s provide an additional prompt denoting that the possibility of certain categories exists.&lt;/p&gt;
&lt;h6 id=&#34;prompt-1&#34;&gt;Prompt:&lt;/h6&gt;
&lt;pre&gt;&lt;code&gt;“These are incorrect, take into account that answers can involve grouping words
 that have similar meaning with the addition of a given prefix or suffix, homophones,
 and anagrams can all be at play to create a category. Please try again.”
&lt;/code&gt;&lt;/pre&gt;
&lt;h6 id=&#34;response-1&#34;&gt;Response:&lt;/h6&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Recording-related terms:&lt;/strong&gt; Record, Tape, Film, Shoot &lt;code&gt;[CORRECT]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anagrams of &amp;lsquo;Staple&amp;rsquo;:&lt;/strong&gt; Pleats, Pastel, Staple, Petals &lt;code&gt;[CORRECT]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Words that can follow &amp;lsquo;Please&amp;rsquo;:&lt;/strong&gt; Please, Hold, Absorb, Engross &lt;code&gt;[WRONG]&lt;/code&gt;. The words almost form the easiest category, but it forces the prefix concept unnecessarily.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Terms associated with monetary or value:&lt;/strong&gt; Penny, Good, Woman, Rivet (stretch, but possibly in terms of securing value) &lt;code&gt;[WRONG]&lt;/code&gt; Switching “Rivet” and “Please” would yield a correct puzzle, but the categories are off significantly. It evens notes “Rivet” is a stretch.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While this attempt recognizes the anagram, the “Pretty ___ “ category seems to be fairly elusive for it and there is no idea of “Attention” in any category.&lt;/p&gt;
&lt;p&gt;Final time… I provided a month’s worth of historical connections games with the categories, their associated difficulty and the four words. &lt;code&gt;This, initially, caused it to regress significantly&lt;/code&gt;, providing an output that no longer followed the rules of the game. After re-orienting back to the task on hand, it butchered its previous progress and… umm… gave a response that added “Woman” to a category called “Objects”…&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Response.png&#34; alt=&#34;Connections&#34;&gt;&lt;/p&gt;
&lt;p&gt;After providing thirty Connections puzzles, I found it was possible for the model to get them correct but only on easier days and after several prompts. If the model receives an extra hint, such as “I’m sure these three words together are in category “X”, but I don’t know the fourth word,” it usually manages to provide a correct answer.&lt;/p&gt;
&lt;p&gt;There are probably better prompts, more prompts, or a curated LLM that could accurately solve most of these puzzles. There is certainly a good amount to explore in prompting an LLM and trying to get it to solve Connections.&lt;/p&gt;
&lt;h1 id=&#34;retrieving-historical-connections-games&#34;&gt;Retrieving historical Connections games:&lt;/h1&gt;
&lt;p&gt;I figured a good place to start would be to create a table containing all historical connections games properly labeled with words, categories, difficulty, and date. This dataset would serve as the basis for understanding common patterns in Connections, as well as a clean set of labeled training data I could use for whatever solution I envisioned.&lt;/p&gt;
&lt;p&gt;While the NY Times has a Wordle Archive, there is no such Connections archive. Luckily, there are a myriad of websites that host all historical connections games.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;So, I &lt;a href=&#34;https://www.instagram.com/davis.clarke/reel/C22PwvNO2kA/?hl=en&#34;&gt;LOCKED IN&lt;/a&gt; and spent ~5 days copying…&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Nah&amp;hellip; actually, I sent a couple cold emails to any address I could find on these sites asking if they would share a file containing the games and answers. No response. I did a quick search of Github and Reddit to see if I could uncover any leads, but no… ok sooooo web-scraping, Yay!&lt;/p&gt;
&lt;p&gt;I wrote a quick &lt;a href=&#34;https://github.com/sschnei8/connections_solver/blob/master/data_functions/scrape.py&#34;&gt;script&lt;/a&gt; to scrape one of the historical Connections sites into a text file. It quickly loops through each month of connections data on the site, extracts the paragraphs with the games and writes them to a text file with a separator. I’m going to leave the URL out, but you can find the site quite easily if you look. After writing 335 connections games to the text file, I convert it to a &lt;a href=&#34;https://github.com/sschnei8/connections_solver/blob/master/data_functions/transform.py&#34;&gt;csv and remove duplicates&lt;/a&gt;. Now, nobody needs to scrape. They can just &lt;a href=&#34;https://github.com/sschnei8/connections_solver/blob/master/data/clean_data.csv&#34;&gt;steal from me&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is probably a good time to point out that this whole &lt;a href=&#34;https://github.com/sschnei8/connections_solver/blob/master/pipeline.py&#34;&gt;process&lt;/a&gt; should be a &lt;a href=&#34;https://dagster.io/&#34;&gt;DAG&lt;/a&gt;, but I dont care about robustness or repeatability right now so&amp;hellip; shortcuts!&lt;/p&gt;
&lt;h1 id=&#34;analytics-cleaning&#34;&gt;Analytics (Cleaning):&lt;/h1&gt;
&lt;p&gt;Cool, so now that we have our &lt;code&gt;“Clean”&lt;/code&gt; CSV, we can copy it directly into a table using duckDB and explore all these historical games. With duckDB you can directly COPY a CSV from disk into a table in a database. Ten lines of code later, and we’ve got a table.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-PYTHON&#34; data-lang=&#34;PYTHON&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# duckdb.connect(dbname) creates a connection to a persistent database&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;conn &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; duckdb&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;connect(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;connections.db&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Create DuckDB table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;conn&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sql(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;DROP TABLE IF EXISTS CONNECTIONS_DATA;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;CREATE TABLE CONNECTIONS_DATA (
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    GAME_DATE DATE,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    DIFFICULTY VARCHAR,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    CATEGORY VARCHAR,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    WORD VARCHAR
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# COPY CSV from disk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;conn&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sql(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;COPY CONNECTIONS_DATA FROM &amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{&lt;/span&gt;clean_csv_location&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; (FORMAT CSV, HEADER);&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In our new table:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each game consists of 16 rows, one for each word&lt;/li&gt;
&lt;li&gt;A category, on a particular day, can be described with four rows&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I choose this because I didn’t want to deal with any unnesting logic for viewing individual words, which would be necessary if I collapsed a game into 4 rows and saved the words in an &lt;a href=&#34;https://duckdb.org/docs/sql/data_types/list.html&#34;&gt;ARRAY/LIST&lt;/a&gt; or just LISTAGG’ed the strings together.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;FYI, duckDB &lt;a href=&#34;https://github.com/duckdb/duckdb/issues/2575&#34;&gt;adopted&lt;/a&gt; the SQL standard of 1 based indexing a &lt;a href=&#34;https://github.com/duckdb/duckdb/pull/3178&#34;&gt;couple years ago&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;You may have noticed I put “CLEAN” in quotes earlier. I’ve got some half-baked web-scraping and transform logic I cheffed up in ~ 1 hour, so I figured there was about a 1% chance I missed some stuff… and it turns out I did indeed miss some stuff.&lt;/p&gt;
&lt;p&gt;Date-category combinations should have four rows. A count of rows (CNT) divided by four should return 1, and then, 1 mod 1 should equal zero. However, the below query returned six instances where CNT equaled 1.25. I discovered hyphenated words like “Yo-Yo” were written as two rows.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; GAME_DATE
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , CATEGORY
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;::FLOAT &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; CNT &lt;span style=&#34;color:#75715e&#34;&gt;-- BECASUE THERE WILL BE 4 ROWS FOR EACH INSTANCE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CONNECTIONS_DATA
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;GROUP&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;HAVING&lt;/span&gt; CNT &lt;span style=&#34;color:#f92672&#34;&gt;%&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;ORDER&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;BY&lt;/span&gt; CNT &lt;span style=&#34;color:#66d9ef&#34;&gt;DESC&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Alright, probably good but let’s run some summary stats to be sure:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;SELECT&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; TOTAL_WORDS
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;::FLOAT &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; TOTAL_DISTINCT_CONNECTIONS
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     , &lt;span style=&#34;color:#66d9ef&#34;&gt;COUNT&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;::FLOAT &lt;span style=&#34;color:#66d9ef&#34;&gt;AS&lt;/span&gt; TOTAL_DISTINCT_DAYS
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;FROM&lt;/span&gt; CONNECTIONS_DATA
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Oops, why is TOTAL_DISTINCT_DAYS 334.5?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;I assumed every game was unique&lt;/li&gt;
&lt;li&gt;There was some boiler plate text in the text file from the web scrape that was duplicated&lt;/li&gt;
&lt;li&gt;So&amp;hellip; why not remove rows in the text file that were duplicates of a previously seen row to clean it up?&lt;/li&gt;
&lt;li&gt;Wrong! Connections reuses categories with identical words:
&lt;ul&gt;
&lt;li&gt;August 30th, 2023, and May 30th, 2024: “Influence” category.&lt;/li&gt;
&lt;li&gt;January 11th, 2024, July 25th, 2023, and August 15th, 2023: “States of matter” category.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Instead of fixing this in the extract code, I just &lt;a href=&#34;https://github.com/sschnei8/connections_solver/blob/master/data_functions/create_db.py#L37&#34;&gt;stripped them out&lt;/a&gt; of the table and called the clean_table funtion at the end of pipeline.py. Sweet, clean CSV, no quotes this time.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I also removed 4/1/2024, April Fools day, when all words were Emojis &lt;strong&gt;¯\_(ツ)_/¯&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&#34;analytics-insights&#34;&gt;Analytics (insights):&lt;/h1&gt;
&lt;p&gt;I hacked together a few &lt;a href=&#34;https://github.com/sschnei8/connections_solver/blob/master/analytics.py&#34;&gt;queries&lt;/a&gt; in a Jupyter notebook style file, printing data frames to the console. Over the +/- 335 games in this dataset there were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~3500 distinct words, not many words have spaces (fewer than 30 occurences), and most words are 4 to 5 letters.&lt;/li&gt;
&lt;li&gt;Word length is consistent across difficulties on average 5.1 letters&lt;/li&gt;
&lt;li&gt;Most commonly reused words: {RING:11, BALL:10, COPY:9, LEAD:9, WING:8, JACK:8, CUT:8, HEART:8}&lt;/li&gt;
&lt;li&gt;Shortest words are individial letters like &amp;ldquo;X&amp;rdquo;, occuring only twelve times&lt;/li&gt;
&lt;li&gt;Longest words include Peppermint Patty, Concentration, Mashed Potato, and Horsefeathers. Horsefeathers is synonymous with nonsense or rubbish (I could see myself yelling this in an argument)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Pivoting out difficulty by category type:
&lt;img src=&#34;/images/Pivot.png&#34; alt=&#34;Connections&#34;&gt;&lt;/p&gt;
&lt;p&gt;Unpacking a few things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;This is biased towards the “Hard” categories, describing ~70% of them. This is due to this category being pattern focused and based on challenging word play each day. The only way I’m going to be able to solve these is to reduce the degrees of freedom in selection or determine the pattern with high confidence.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Dunderscores are extremely common, describing ~40% of hard categories:
- ROCK ___: (Star, Candy, Bottom, Garden)
- ___ MOON: (Sailor, New, Blue, Harvest)

Homophones show up a few times, and I should probably just do a static check
for these against a database of English language homophones.

Anagrams should be checked similarly.
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Prefix_String, Suffix_String, Minus, Added_Plus all denote occasions where the words are similar due to having some character(s) appended or removed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;WITH categories are a significant challenge. I’m not sure how to approach them yet.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Things With Trunks: (Cars, Elephants, Swimmers, Trees)
Tarot Carods With “THE”: (Fool, Lovers, Magician, Tower)
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;THINGS overlaps WITH 11 times in the data, so I could have included the intersection of the two: THINGS WITH ___. Instead, I’ve just pulled the intersections into the WITH category.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Things People Shake: (Hands, Maraca, Polaroid, Snowglobe)
Things to crack: (Egg, Knuckle, Smile, Window)
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Medium is typically some niche knowledge like countries with red and white flags, weapons in the game Clue, or British cuisine. To some, this is akin to “Easy” if they are familiar with the terms. For the less familiar, these can be confounding and lead to misses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Easiest/Easy difficulties are generally tightly related words without requiring a layer of context. However, they aren’t always so straightforward.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Connect: (Couple, Link, Tie, Join)
Attach with adhesive: (Adhere, Glue, Paste, Stick)
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Filtering for the triple dunderscore categories was a bit tricky, with &lt;code&gt;_&lt;/code&gt; being reserved. In Redshift, I was able to do &lt;code&gt;LIKE ‘%\\\\\_\_\_%’&lt;/code&gt; , but this doesn’t seem to work in duckDB. Instead, I went with some length based logic checking for at least a difference of 2 characters once stripping them out:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-SQL&#34; data-lang=&#34;SQL&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;LENGTH&lt;/span&gt;(CATEGORY) &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;LENGTH&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;REPLACE&lt;/span&gt;(CATEGORY, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\_&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;)) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h1 id=&#34;final-thoughts-on-a-solution&#34;&gt;Final thoughts on a solution:&lt;/h1&gt;
&lt;p&gt;One advantage of this problem is that its surface area decreases exponentially with the reduction in the size of the initial set. Something I never explicitly provided Chat-GPT was information regarding which categories it had gotten right and what was wrong. Allowing it to build on top of past decisions, and reducing the size of the problem.&lt;/p&gt;
&lt;p&gt;For combinations, the formula: C(n,r)=n!/r!(n-r)!&lt;/p&gt;
&lt;p&gt;Evaluates the number of ways a sample of “r” elements can be obtained from a larger set of “n” distinguishable objects where order does not matter (unlike permutations) and repetitions are not allowed.&lt;/p&gt;
&lt;p&gt;At the beginning of any Connections game, a player will choose 4 words from a set of 16 yielding 1820 possible combinations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;C(16,4)=16!/4!(16-4) = 1820&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As categories are found and labeled as correct the set is reduced: 16-&amp;gt;12-&amp;gt;8-&amp;gt;4&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;C(12,4)=12!/4!(12-4) = 495&lt;/li&gt;
&lt;li&gt;C(8,4)=8!/4!(8-4) = 70&lt;/li&gt;
&lt;li&gt;C(4,4)=4!/4!(4-4) = 1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;/images/Problem_Space_min.png&#34; alt=&#34;Connections&#34;&gt;&lt;/p&gt;
&lt;p&gt;Like I mentioned in the intro, I find NLP interesting, but I am by no means an expert. After my manual checks to catch some of the tricky categories in Connections, I’m not entirely sure what an optimal framework looks like for grouping four words and iterating over a solution:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/images/Solution_min.png&#34; alt=&#34;Connections&#34;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maybe I don’t even need to perform these manual checks if I provide the right context to the correct algorithm?&lt;/li&gt;
&lt;li&gt;I’ve noted “confidence” several times in the diagram, but I’m not sure how to objectively arrive at this or if I’ll need to normalize “confidence” across different categories/difficulties?&lt;/li&gt;
&lt;li&gt;I’ll want to operate on a reduced set of words given that a category has correctly been identified, but I’m not sure if I should throw out previous work to avoid any noise created by the larger set of words?&lt;/li&gt;
&lt;li&gt;I&amp;rsquo;ll need to figure out how to handle Nouns, single letters, and words with spaces (Maybe this requires injecting a word classification layer between the raw set of words and the cached check for specific categories)&lt;/li&gt;
&lt;li&gt;Not sure I’ll be able to get this category: Reads the Same Rotated 180 degrees: 96, MOW, NOON, SIS&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Anyway, that’s part one. Who knows what part two will be, stay greasy!&lt;/p&gt;
</content>
    </item>
    
  </channel>
</rss>
