<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>The Living Deadline</title>
<link>https://alexhans.github.io/</link>
<atom:link href="https://alexhans.github.io/index.xml" rel="self" type="application/rss+xml"/>
<description>Alex Guglielmone Nemi&#39;s blog</description>
<generator>quarto-1.8.27</generator>
<lastBuildDate>Mon, 16 Mar 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>From scribble to searchable: building a sketch-to-text Agent Skill</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/series/evals/sketch-to-text-skill.html</link>
  <description><![CDATA[ 






<p>I like sketching on an <a href="https://euroshop.boox.com/products/boox-note-air3-c?variant=42967495770312">Onyx Boox</a>. Diagrams, flowcharts and rough system designs can go here as freehand ink.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://alexhans.github.io/posts/series/evals/fake-diagram-boox.jpg" class="img-fluid figure-img"></p>
<figcaption>Onyx Boox Air 3 C With a Fake Diagram</figcaption>
</figure>
</div>
<p>What I do not like is transcribing or re-drawing in a diagramming tool. It’s wasteful and, as I often say, if something is annoying and you do it often, there’s probably a better way.</p>
<p>Text is something agents and humans can both read, and is easy to store in <a href="https://en.wikipedia.org/wiki/Git">source control</a>. So I built an <a href="https://alexhans.github.io/posts/series/evals/building-agent-skills-incrementally.html">Agent Skill</a> to do the conversion.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>I show you this, not so you necessarily use this, but so you get a sense of how easy it is to apply this same methodology to your own pain points. Michael Kennedy from Talk Python to me called it <a href="https://mkennedy.codes/posts/what-hyper-personal-software-looks-like/">hyper-personal software</a>.</p>
</div>
</div>
<section id="the-skill" class="level1">
<h1>The Skill</h1>
<p><a href="https://github.com/Alexhans/blog-samples/tree/main/skills/sketch-to-text"><code>sketch-to-text</code></a> takes a handwritten PDF or image and converts it into a <a href="https://quarto.org/">Quarto</a> <code>.qmd</code> file with <a href="https://mermaid.js.org/">Mermaid</a> diagrams. The output renders, links, and lives with the rest of my writing.</p>
<p>Here’s one of my sketches converted (<a href="https://github.com/Alexhans/blog-samples/blob/main/skills/sketch-to-text/evals/diagram-1.pdf">original PDF</a>):</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">flowchart TD
    start("I want to share a game to teach")
    describe{"Can I describe the game? Intent"}
    tell_ai("Tell the AI to create game")
    explain("Explain it to the AI and play up scenarios that you had")
    iterate["Iterate asking for changes"]
    client_only{"Only client side?"}
    firebase[/Firebase/]
    publish("Publish to easy app")

    start --&gt; describe
    describe -- yes --&gt; tell_ai
    describe -- no --&gt; explain
    explain --&gt; describe
    tell_ai --&gt; iterate
    iterate --&gt; iterate
    iterate --&gt; client_only
    client_only -- yes --&gt; publish
    client_only -- no --&gt; firebase
    firebase --&gt; publish
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<p>That came from telling Claude: “convert diagram-1.pdf to quarto”.</p>
</section>
<section id="how-i-built-it" class="level1">
<h1>How I built it</h1>
<section id="rubber-duck-first" class="level2">
<h2 class="anchored" data-anchor-id="rubber-duck-first">Rubber duck first</h2>
<p>I didn’t start by building anything, I started by complaining about what was annoying to an LLM and thinking through what a good solution would look like.<sup>1</sup></p>
<p>I went through the Boox sync options: BOOXDrop, WebDAV, Obsidian, the various export formats (vector PDF, bitmap PDF, .note) that appear in the UI. Most had friction or lock-in I didn’t want. Then, as a test, I drew a quick <a href="https://github.com/Alexhans/blog-samples/blob/main/skills/sketch-to-text/evals/diagram-2.pdf">diagram</a> and asked the LLM to convert it to Mermaid.</p>
<p>It worked well enough to make the skill idea feel viable. I decided Quarto as a destination and any agent CLI as a runner were both fine, and that my only real work would be building good enough ground truth to test against.<sup>2</sup></p>
</section>
<section id="talk-then-build" class="level2">
<h2 class="anchored" data-anchor-id="talk-then-build">Talk, then build</h2>
<p>Once I knew what I wanted, I described it to my agent and let it write the skill. The skill’s flow is: read → classify → extract structure → generate Mermaid → self-check → write. I didn’t have to think about it that much. I was already drawing a few more diagrams to build the ground truth.</p>
</section>
<section id="polish-with-evals" class="level2">
<h2 class="anchored" data-anchor-id="polish-with-evals">Polish with evals</h2>
<p>After I had 6 diagrams (with rushed handwritten that proved hard to read, even for myself) I converted them into the baseline quarto files and manually compared actual vs expected respectively.</p>
<p>This exposed issues fast and forced me to decide details that were hard to foresee upfront. Cloud shapes, for example, aren’t implemented in Quarto’s bundled Mermaid renderer. When I hit that in <a href="https://github.com/Alexhans/blog-samples/blob/main/skills/sketch-to-text/evals/diagram-4.pdf">diagram 4</a>, I had to decide what to do with <code>@{ shape: cloud }</code> which is newer Mermaid and unsupported, and opted for <code>((("text")))</code> since it’s supported, and the double circle is visually impactful and also represents a “Stop Point”.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">flowchart TD
    bus(("Bus data"))
    tram(("Tram data"))
    prices(("Prices"))
    pt["Public Transport"]
    sim["Simulation"]
    qt["Quality Table"]
    dash((("exposure: Dashboard")))
    result((("exposure: result_table")))

    bus --&gt; pt
    tram --&gt; pt
    prices --&gt; sim
    pt --&gt; sim
    sim --&gt; qt
    sim --&gt; dash
    sim --&gt; result
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<p>For each issue I found, I iterated with the agent and it updated the skill. The loop is simple: try on real input, see what breaks, make the decision explicit, validate.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>When building ground truth you’re allowed to change the data, the expected output, and/or the skill logic. Whatever works to get a baseline that’s honest and easy to iterate against.</p>
</div>
</div>
<p>With ground truth in place, I ran <a href="https://promptfoo.dev">promptfoo</a><sup>3</sup> evals: each diagram PDF went through the skill, the output was checked against the reference via deterministic checks whenever I could (<code>icontains</code> assertions) and LLM as a Judge with local <code>deepseek-r1:14b</code> elsewhere. The exact tool used is really not the important part.</p>
<p>One run I got 5/6 since <a href="https://github.com/Alexhans/blog-samples/blob/main/skills/sketch-to-text/evals/diagram-5.pdf">Diagram 5</a> had failed. The skill dropped the edge labels on the three branches.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">flowchart TD
    banner["Design, De-Risk &amp; Jumpstart"]
    intent["Intent"]
    goals["`Reduce effort
      Increase ROI
      Fail fast`"]
    guardrails["Guardrails"]

    banner -- Design --&gt; intent
    banner -- De-Risk --&gt; goals
    banner -- Jumpstart --&gt; guardrails
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<p>The fix was one line in <code>SKILL.md</code> to preserve every label written on or beside an arrow; a labelled arrow must use <code>-- label --&gt;</code> not bare <code>--&gt;</code>. I re run for a 6/6 outcome. I didn’t edit the skill manually. It was all conversation.</p>
<p>A failing eval tells you exactly what the skill missed. One quick instruction to the agent, a re-run, and you know it holds.</p>
</section>
</section>
<section id="why-the-evals-matter-more-than-the-skill" class="level1">
<h1>Why the evals matter more than the skill</h1>
<p>The skill is useful immediatly but the evals are what make it safe and easy to change weeks later when you don’t remember any of the details.</p>
<p><code>sketch-to-text</code> handles flowcharts well for what I tested and is harmless enough to use whenever. If I notice inaccuracies, I can figure out whether the problem is in the source file, the skill logic, or the model. Fix it, add the new case to the evals, and catch future regressions. If I want to expand it, I’ll know I haven’t broken anything for my existing cases.</p>
<p>The investment is a bit of ground truth up front, in exchange for confidence on every future change.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>The eval inputs and ground truth files are co-located with the skill for now while the right distribution format for Agent Skill evals is still being worked out. See the <a href="https://github.com/alexhans/blog-samples/tree/main/skills/sketch-to-text/evals">evals README</a>.</p>
</div>
</div>
</section>
<section id="links" class="level1">
<h1>Links</h1>
<ul>
<li>Skill: <a href="https://github.com/Alexhans/blog-samples/tree/main/skills/sketch-to-text">blog-samples/skills/sketch-to-text</a></li>
<li>Evals: <a href="https://github.com/Alexhans/github-eval-ception/tree/main/exams/sketch-to-text">github-eval-ception/exams/sketch-to-text</a></li>
</ul>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Rubber duck debugging is the practice of explaining a problem out loud (originally to a rubber duck) to clarify your own thinking before asking for help. Popularised by Hunt and Thomas in <a href="https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/">The Pragmatic Programmer</a>.↩︎</p></li>
<li id="fn2"><p>Ground truth is a manually verified reference output used to measure whether a system is working correctly. In evals, it’s the answer you’d accept if a human did the task well.↩︎</p></li>
<li id="fn3"><p>Promptfoo is one of several open-source eval frameworks. See <a href="https://ai-evals.io/tools/compare">ai-evals.io/tools/compare</a> for a comparison of the main options.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>genai</category>
  <category>productivity</category>
  <category>workflows</category>
  <guid>https://alexhans.github.io/posts/series/evals/sketch-to-text-skill.html</guid>
  <pubDate>Mon, 16 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Building Agent Skills: Intent, Determinism, and Stability</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/series/evals/building-agent-skills-incrementally.html</link>
  <description><![CDATA[ 






<p>I want to offer a mental model and decision tree for building <a href="https://agentskills.io/home">Agent Skills</a> incrementally. It’s meant for anyone experimenting with them - not just for software developers - and focuses on staying in control as complexity grows or you start thinking about sharing or collaborating with others.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>It’s awesome to see increased adoption of <a href="https://agentskills.io/home">Agent Skills</a> to package workflows <sup>1</sup> and I attribute a lot of their success to standardized contracts for central use and managing context through progressive disclosure <sup>2</sup> (More mature tools/MCPS and models definitely lowered friction as well).</p>
</div>
</div>
<section id="mental-model" class="level1">
<h1>Mental Model</h1>
<p>You can think of <strong>Agents</strong> as <strong>assistants</strong> that can take the load from you and <strong>Agent Skills</strong> as the <strong>high level instructions</strong> you might leave to them, <strong>in a standardized format</strong>.</p>
<p>You need to know what you want from them, and the tradeoffs between micromanagement and agency - <strong>Intent</strong>. You want to offload mechanical work so they’re not reasoning about things that a tool like a calculator or a spreadsheet could handle - <strong>Determinism</strong> and you want the whole thing to hold up even if you swap one assistant for another - <strong>Stability</strong>.</p>
<p>Looking at the shape of an Agent Skill, you can mentally map it as:</p>
<ul>
<li>Intent -&gt; Markdown instructions</li>
<li>Determinism -&gt; tools and scripts</li>
<li>Stability -&gt; Tests and <a href="https://ai-evals.io">AI Evals</a></li>
</ul>
<p>If your intent is clear enough, the built-in <a href="https://github.com/metaskills/skill-builder">skill-builder</a> in many agent CLIs may be sufficient especially if you still validate outputs manually or stay in the loop for approvals. The more you want to change the skill without manually validating every run, or the more you worry about undesired/rogue behavior, the more determinism helps: scripts reduce <a href="../../../posts/series/evals/error-compounding-genai-systems-approach.html">error compounding</a>, and unit tests plus AI evals reduce drift risk and make iteration safer and faster, especially when sharing or collaborating.</p>
</section>
<section id="decision-tree" class="level1">
<h1>Decision Tree</h1>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">flowchart TD
  A([Start: I have or am developing an Agent Skill]) --&gt; Q1{Only for you&lt;br/&gt;and you're happy manually reviewing outputs?}

  Q1 -- Yes --&gt; L0["Level 0: Intent (Markdown only might be enough for you)&lt;br/&gt;- Clear inputs/outputs help&lt;br/&gt;- Examples help&lt;br/&gt;- Define good enough"]
  Q1 -- No --&gt; Q2{Need repeatable structure&lt;br/&gt;or mechanical consistency?}

  Q2 -- No --&gt; L0
  Q2 -- Yes --&gt; L1["Level 1: Determinism (Tools) &lt;br/&gt;- Move mechanical steps into tools/scripts&lt;br/&gt;- Use structured output&lt;br/&gt;- If possible, log tool inputs/outputs"]

  L1 --&gt; Q3{Will others use it&lt;br/&gt;or will you modify it often&lt;br/&gt;without manual re-checking?}

  Q3 -- No --&gt; L1
  Q3 -- Yes --&gt; L2["Level 2: Stability (Tests + Evals)&lt;br/&gt;- Unit tests for tools/scripts&lt;br/&gt;- AI evals for behavior/user stories&lt;br/&gt;- Minimal Golden cases + some edge cases"]

  L2 --&gt; Q4{Can it access sensitive data&lt;br/&gt;or take impactful actions&lt;br/&gt;or run unattended?}

  Q4 -- No --&gt; L2
  Q4 -- Yes --&gt; L3["Level 3: Safety/Scale&lt;br/&gt;- Guardrails + least privilege&lt;br/&gt;- Human approval for high impact&lt;br/&gt;- Security-focused evals"]
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Observability</strong>: This is a key point I omitted from the levels here. It lets you monitor cost (tokens spent), latency, tool selection, and more. You should add it as soon as you feel that you are missing that information. Investing in this space is important to answer questions about what happened in a particular “agent instruction following loop”.</p>
</div>
</div>
<p><strong>Go deeper on each level:</strong></p>
<ul>
<li><strong>Level 0 - Intent:</strong> <a href="https://agentskills.io/what-are-skills">Skills Spec</a></li>
<li><strong>Level 1 - Determinism:</strong> <a href="../../../posts/series/evals/error-compounding-genai-systems-approach.html">Error compounding + determinism</a></li>
<li><strong>Level 2 - Stability:</strong> <a href="https://ai-evals.io">Evals primer</a></li>
<li><strong>Level 3 - Security:</strong> If you’re here, least privilege and human approval for high-impact actions are usually a good baseline. For each integration point, ask what the worst-case outcome is. It’s also worth understanding <a href="https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub/">Prompt Injection</a>.</li>
</ul>
<section id="a-note-on-experience" class="level2">
<h2 class="anchored" data-anchor-id="a-note-on-experience">A note on experience</h2>
<p>The levels are illustrative and exist to prevent overwhelming yourself and avoid paralysis - either from fear of breaking things or from too many choices at the start.</p>
<p>After you build a few skills, recognizing patterns becomes easier and you can decide where to invest based on your own <a href="../../../posts/series/evals/measure-first-optimize-last.html">pain points</a>. You may start thinking about intent, determinism, stability, and safety from the beginning.</p>
<p>That does not mean implementing everything at once. It means being aware of more tradeoffs earlier.</p>
<p>Build only what you need, and keep it as simple as possible.</p>
</section>
<section id="practical-takeaway" class="level2">
<h2 class="anchored" data-anchor-id="practical-takeaway">Practical Takeaway</h2>
<p>Use skills to clarify intent. When a step stabilizes, move it into code or tools. Not because the model can’t do it, but because you don’t want to rediscover the same approach on every run. That lowers cost, reduces drift risk, and keeps room for directed experiments.</p>
<p>You can build skills with your agent CLI of choice (Claude/Codex/OpenCode), or use frameworks that support the pattern, like Doug Trajano’s <a href="https://github.com/DougTrajano/pydantic-ai-skills">Agent Skills implementation for PydanticAI</a> (<a href="https://dougtrajano.github.io/pydantic-ai-skills/">docs</a>).</p>
<hr>
</section>
<section id="call-to-action" class="level2">
<h2 class="anchored" data-anchor-id="call-to-action">Call to action</h2>
<p>If you have a skill you want to take beyond Markdown with determinism or <a href="https://ai-evals.io/">AI Evals</a>, share it. We can discuss which steps are missing to move from one level to the next. It may be simpler than it looks, and we could use it as a public-facing example to help others see specific ways to improve.</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Turning a sequence of steps you’d otherwise repeat manually into a single, reusable instruction an agent can follow.↩︎</p></li>
<li id="fn2"><p>Progressive disclosure means your agent doesn’t need every instruction in context at once. It can load what’s relevant when needed. See Doug Trajano’s <a href="https://github.com/DougTrajano/pydantic-ai-skills">PydanticAI Agent Skills implementation</a> and <a href="https://dougtrajano.github.io/pydantic-ai-skills/">docs</a>.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>genai</category>
  <category>workflows</category>
  <category>reliability</category>
  <guid>https://alexhans.github.io/posts/series/evals/building-agent-skills-incrementally.html</guid>
  <pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Measure First, Optimize Last: My Approach to AI Evals</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/series/evals/measure-first-optimize-last.html</link>
  <description><![CDATA[ 






<p>If you can’t measure it, you’re guessing. Here’s how I think about evals, practical examples at <a href="https://ai-evals.io/">ai-evals.io</a>.</p>
<section id="start-with-pain-not-tooling" class="level1">
<h1>Start with pain, not tooling</h1>
<p>My eval approach is pain-point driven:</p>
<ul>
<li>I can’t compare what I can’t measure.</li>
<li>I can’t trust an AI system to run on its own if I can’t quantify failure.</li>
</ul>
<p>If those two are unresolved, I am not “doing evals.” I am guessing.</p>
</section>
<section id="treat-it-like-automation-engineering" class="level1">
<h1>Treat it like automation engineering</h1>
<p>I frame agent work the same way as any automation effort. Unlike traditional ML experimentation, automation engineering demands you define acceptable behaviour upfront. Not after deployment:</p>
<ul>
<li>Can I describe exactly what I want? (intentionality)</li>
<li>What is the worst-case blast radius<sup>1</sup> if this fails?</li>
</ul>
<p>That framing forces clarity early and keeps risk visible.</p>
</section>
<section id="build-the-smallest-useful-test-loop" class="level1">
<h1>Build the smallest useful test loop</h1>
<p>I treat early eval work like integration testing plus TDD<sup>2</sup> habits:</p>
<ul>
<li>Skip big infra at the beginning.</li>
<li>Put the agent where users already work so it behaves like an extra set of hands.</li>
<li>Recreate real user stories and questions.</li>
<li>Use <a href="../../../posts/series/evals/error-compounding-genai-systems-approach.html">deterministic checks</a> wherever possible; don’t default to LLM-as-a-judge for everything.</li>
</ul>
<p>The goal is not “more tests.” The goal is tests that maximize iteration speed and control.</p>
</section>
<section id="optimize-late" class="level1">
<h1>Optimize late</h1>
<p>This space moves fast. Over-optimizing too early is usually waste.</p>
<p>I prefer to keep things:</p>
<ul>
<li>Minimal</li>
<li>Easy to Change <a href="https://www.youtube.com/watch?v=c8AzqMr87gQ">(ETC)</a></li>
</ul>
<p>In practice that means:</p>
<ul>
<li>Parameterized experiments</li>
<li>Easy comparison across runs, configs, and components</li>
</ul>
<p>Once benchmarks are stable, then optimize cost and latency:</p>
<ul>
<li>Pick the cheapest model/config that still meets the bar.</li>
</ul>
<p>That’s it: measure first, constrain risk, iterate fast, optimize last.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Blast Radius: refers to the extent of system, data, or user impact caused by a component failure, security breach, or faulty code deployment. See an <a href="https://www.morphisec.com/blog/blast-radius-fallout-strengthening-cyber-resilience-after-the-largest-it-crash/">extreme example in the Falcon Sensor 2024 crash</a>.↩︎</p></li>
<li id="fn2"><p>TDD: Test Driven Development: The act of driving your code thinking intentionally about testability and what tests expose the behaviour you want. Dogmatically it can seem slow and unhelpful but thinking about testability and adding tests to catch prevent bugs from recurring are very useful practices to have. Otherwise, overbuilding is very easy.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>engineering</category>
  <category>genai</category>
  <category>evals</category>
  <guid>https://alexhans.github.io/posts/series/evals/measure-first-optimize-last.html</guid>
  <pubDate>Tue, 10 Feb 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Get the Value of a High-Quality Audit, All the Time</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/series/evals/automate-audits.html</link>
  <description><![CDATA[ 






<p>We keep asking questions we should already know the answer to. And we usually ask them when a decision depends on it.</p>
<p>Sometimes we guess. Sometimes we do a one-off investigation. Sometimes we shrug and move on.</p>
<p>What if you could get the value of a high-quality audit <strong>all the time</strong>?</p>
<hr>
<section id="the-idea" class="level2">
<h2 class="anchored" data-anchor-id="the-idea">The idea</h2>
<p>Instead of running audits occasionally, automate the audit itself.</p>
<p>An audit is just a set of questions.</p>
<p>If you make those questions explicit, and make them answerable repeatedly, the audit stops being a one-off activity and becomes something you can run continuously.</p>
<hr>
</section>
<section id="the-method" class="level2">
<h2 class="anchored" data-anchor-id="the-method">The method</h2>
<p>Start from pain points you already feel, or decisions you struggle to make.</p>
<p>From each pain point, write the questions that would help you address it.</p>
<p>Those questions imply dimensions (what you want coverage over) and, over time, a set of entities that describe your world.</p>
<p>Don’t try to be complete. Just describe enough of your world to support the questions you care about.</p>
<p>For example, “we don’t know if we can reproduce the data of our projects” is a pain point that may prompt the questions:</p>
<ul>
<li>Which projects have code packages?</li>
<li>Which code packages have a README with reproducibility steps?</li>
<li>Which README file instructions actually work?</li>
</ul>
<p>The first two should be relatively trivial to check (in this GenAI world) and you can decide how much value the third one gets but the important point is that you’ve introduced what you know and surfaced what you currently can answer in case it’s worth it later.</p>
<p>The dimensions in our example are Code Packages and Projects.</p>
<p>These are actionable and describe different states of knowledge and reproducibility readiness.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Other questions will likely arise from these such as “Which projects are ongoing? Which code packages belong to which projects?”</p>
</div>
</div>
<hr>
</section>
<section id="the-entities-in-your-world" class="level2">
<h2 class="anchored" data-anchor-id="the-entities-in-your-world">The entities in your world</h2>
<p>After writing a few questions, you can jump to a whiteboard and try to describe a lot of the entities in your world to brainstorm about what you own, what you interact with and how things cluster together. The entities are the targets of the questions, and dimensions allow you to define the coverage. Example: <em>“We have a total of 12 projects: 5 have READMEs with reproducibility steps, 2 don’t have READMEs, 5 have no packages.”</em></p>
<p>It’s up to you whether you want to immediately define actions to take or treat it as a helpful data point for others to decide. The point is you stop spending time re-answering ad hoc questions.</p>
</section>
<section id="the-forcing-function" class="level2">
<h2 class="anchored" data-anchor-id="the-forcing-function">The forcing function</h2>
<p>Once you have questions, you write evals, checks that verify you can answer them. (See <a href="https://www.promptfoo.dev/docs/configuration/expected-outputs/">promptfoo’s documentation</a> for one way to implement them.)</p>
<p>An eval is simply:</p>
<blockquote class="blockquote">
<p><em>If I ask this question, I expect to get this kind of answer.</em></p>
</blockquote>
<p>Writing the eval is where the value appears.</p>
<p>The moment you write it, you’re forced to confront whether you can actually answer the question at all.</p>
<ul>
<li>If you can, the eval is straightforward.</li>
<li>If you can’t, you’ve discovered something you thought you knew but didn’t.</li>
<li>If the answer is subjective, that gap becomes explicit.</li>
</ul>
<p>You don’t need perfect answers. You need to know which evals pass and which ones reveal that you never had a real answer.</p>
<p>That alone is valuable.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>While writing evals in a form that doesn’t break as things change requires some practice, it’s not your primary concern when starting. It’s fine to be explicit and baseline.</p>
</div>
</div>
<hr>
</section>
<section id="what-you-get" class="level2">
<h2 class="anchored" data-anchor-id="what-you-get">What you get</h2>
<p>Once questions have evals, answering them becomes an implementation detail. With MCPs (Model Context Protocol servers), <a href="https://playwright.dev/">Playwright</a>, etc. Programmatic access is easier than ever, the hard part isn’t answering questions, it’s knowing which ones to ask and systematizing what “good” looks like.</p>
<p>Over time, you start to see where your questions apply, which are too vague, which are easier than expected, and where your description of the world is still thin.</p>
<p>You don’t need to go all in. Start with a few questions and a small report that grows over time.</p>
<p>At that point, you’re no longer redoing one-off audits.</p>
<p>You’re running a <strong>continuous audit</strong> whose scope is defined by the questions you care about.</p>
<hr>
</section>
<section id="why-this-is-low-risk" class="level2">
<h2 class="anchored" data-anchor-id="why-this-is-low-risk">Why this is low-risk</h2>
<ul>
<li>You don’t need to know everything upfront.</li>
<li>You don’t need to define “good” everywhere.</li>
<li>You don’t need a complete model of your world.</li>
</ul>
<p>You just need to acknowledge you may not know the answer for questions you’ve not explicitly tried to answer (quite similar to feeling pain when doing unit testing while being unsure of what you want)</p>
<p>Even unanswered questions are useful. They tell you what’s unclear, subjective, or not worth investing in.</p>
<hr>
</section>
<section id="closing" class="level2">
<h2 class="anchored" data-anchor-id="closing">Closing</h2>
<p>You don’t need to audit everything.</p>
<p>You just need to stop rediscovering the same answers, and the same unknowns, over and over again.</p>
<p>Start with a few questions. Let the audit grow.</p>
<p>If it doesn’t help, stop.</p>
<p>The cost is small. The insight compounds.</p>


</section>

 ]]></description>
  <category>engineering</category>
  <category>productivity</category>
  <category>genai</category>
  <category>ops</category>
  <guid>https://alexhans.github.io/posts/series/evals/automate-audits.html</guid>
  <pubDate>Thu, 29 Jan 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Reducing Error Compounding in GenAI Systems</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/series/evals/error-compounding-genai-systems-approach.html</link>
  <description><![CDATA[ 






<p>GenAI is non-deterministic and can fail or produce different results for the same input.</p>
<p>A typical prompt-to-action flow involves many LLM calls. Each call is a chance for the model to misinterpret, hallucinate, or produce an unusable output.</p>
<p>The question isn’t if errors happen. It’s what happens when they do, and how many opportunities you give them to cascade.</p>
<hr>
<section id="how-bad-does-it-get" class="level2">
<h2 class="anchored" data-anchor-id="how-bad-does-it-get">How bad does it get?</h2>
<p>Consider a simple case with just 4 steps (note: this is illustrative, consider how your system might have 10, 20, or more LLM calls):</p>
<ul>
<li>Step 1: 95% chance of correct</li>
<li>Step 2: 95% chance of correct<br>
</li>
<li>Step 3: 95% chance of correct</li>
<li>Step 4: 95% chance of correct</li>
</ul>
<p>End-to-end: ~81% chance everything is correct.</p>
<p>Now compare:</p>
<ul>
<li>Step 1 (LLM): user intent → structured call (95%)</li>
<li>Step 2 (deterministic tool): execute (98%)</li>
<li>Step 3 (deterministic validation): parse + check (97%)</li>
<li>Step 4 (LLM): result → response (96%)</li>
</ul>
<p>End-to-end: ~87%</p>
<!--

::::::{.cell layout-align="default"}

:::::{.cell-output-display}

::::{}
`<figure class=''>`{=html}

:::{}

<pre class="mermaid mermaid-js">flowchart TB
    subgraph A[&quot;All-LLM → 81%&quot;]
        direction LR
        A1[&quot;LLM 95%&quot;] --&gt; A2[&quot;LLM 95%&quot;] --&gt; A3[&quot;LLM 95%&quot;] --&gt; A4[&quot;LLM 95%&quot;]
    end

    subgraph B[&quot;Mixed → 87%&quot;]
        direction LR
        B1[&quot;LLM 95%&quot;] --&gt; B2[&quot;Tool 98%&quot;] --&gt; B3[&quot;Validate 97%&quot;] --&gt; B4[&quot;LLM 96%&quot;]
    end
</pre>
:::
`</figure>`{=html}
::::
:::::
::::::
-->
<p><strong>Same model. Different architecture. 6+ point improvement.</strong></p>
<hr>
</section>
<section id="two-high-leverage-approaches" class="level2">
<h2 class="anchored" data-anchor-id="two-high-leverage-approaches">Two high-leverage approaches</h2>
<ol type="1">
<li><strong>Remove 1 or many GenAI steps entirely</strong>, fewer chances to fail</li>
<li><strong>Replace GenAI steps with deterministic ones</strong>, lower error rate per step</li>
</ol>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>These aren’t the only ways to reduce error (e.g.&nbsp;consensus systems, retries, etc) but the fundamentals expressed would apply everywhere, no matter whether it’s one agent or a swarm.</p>
</div>
</div>
<hr>
</section>
<section id="what-makes-deterministic-steps-different" class="level2">
<h2 class="anchored" data-anchor-id="what-makes-deterministic-steps-different">What makes deterministic steps different</h2>
<p>Deterministic steps still fail, but the failure characteristics differ from LLM failures:</p>
<ul>
<li><strong>Bounded</strong>: failures come from a finite set of causes (parse error, timeout, missing field), not open-ended misinterpretation</li>
<li><strong>Repeatable</strong>: same input, same failure: you can reproduce and fix it</li>
<li><strong>Non-semantic</strong>: a crashed process doesn’t convince the next step that “actually the user meant X”</li>
</ul>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>This doesn’t mean deterministic = reliable. It means when things break, they break in less subtle ways and there’s a lot of software engineering history behind their robustness.</p>
</div>
</div>
<hr>
</section>
<section id="the-design-pattern" class="level2">
<h2 class="anchored" data-anchor-id="the-design-pattern">The design pattern</h2>
<p>LLMs do many hard parts (interpreting intent, choosing tools, dealing with syntax, reasoning through results, deciding what comes next).</p>
<p>In a simplified flow, the model might:</p>
<ol type="1">
<li>Receive user intent (natural language)</li>
<li>Decide which tool to call and with what parameters</li>
<li>Receive structured output from the tool</li>
<li>Decide: done, or call another tool?</li>
<li>Repeat until ready to respond</li>
<li>Translate the final result back to the user</li>
</ol>
<p>A lot of reasoning and orchestration happens there and the point isn’t to limit that but to give it good building blocks.</p>
<p>A human user is more effective with better building blocks (e.g.&nbsp;well designed libraries or cli tools) and so is an LLM.</p>
<p>What you want are building blocks that are:</p>
<ul>
<li>reusable</li>
<li>composable</li>
<li>well tested</li>
<li>easy to change and maintain</li>
<li>cost effective</li>
</ul>
<p>And a model that acts as a translation layer, not the tool running all the logic.</p>
<hr>
</section>
<section id="practical-recommendations" class="level2">
<h2 class="anchored" data-anchor-id="practical-recommendations">Practical recommendations</h2>
<section id="identify-the-deterministic-core" class="level3">
<h3 class="anchored" data-anchor-id="identify-the-deterministic-core">Identify the deterministic core</h3>
<p>If you are writing a Claude skill and a step can be expressed as code, ask yourself why you’re not expressing it as code.</p>
<p>The tradeoff is real:</p>
<p><strong>Leaving logic in prose means:</strong></p>
<ul>
<li>Higher error rate at runtime</li>
<li>Relying on <a href="https://www.promptfoo.dev/docs/configuration/expected-outputs/">evals</a> instead of <a href="https://docs.pytest.org/en/stable/how-to/unittest.html">unit tests</a> (if you don’t know what one or either of these are, then you’re definitely safer in the frozen code world)</li>
<li>Paying the cost and error on every execution</li>
<li>Yes, it <em>might</em> improve as models improve, but you’re paying for that uncertainty every time</li>
</ul>
<p><strong>Moving logic to code means:</strong></p>
<ul>
<li>Lower error rate (deterministic execution)</li>
<li>Unit testable</li>
<li>Cheaper to run</li>
<li>Still easy to write with LLMs, have the model generate the code once instead of regenerating the logic from prose every time</li>
<li>You can still ask LLMs to review or improve the code later if you want</li>
</ul>
<p>The second option gives you confidence that things actually work. The first option defers that confidence in exchange for alleged convenience.</p>
<p>If the LLM can write code for you, why have it translate markdown to logic on every run? Make the translation once, freeze it as code, and test it properly.</p>
</section>
<section id="force-structure-at-boundaries" class="level3">
<h3 class="anchored" data-anchor-id="force-structure-at-boundaries">Force structure at boundaries</h3>
<p>Don’t pass prose between steps. Use formats that are easy to serialize and deserialize, like JSON/YAML with schemas you can validate against.</p>
<p>Structure lets you validate, detect errors and attempt course correction or fail fast, diff, log, and evaluate deterministically.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>This will even save you money, time and compute resources <a href="https://www.promptfoo.dev/docs/configuration/expected-outputs/deterministic/">by not having to use LLM as a Judge for assertion in your evals</a>.</p>
</div>
</div>
</section>
<section id="test-the-building-blocks" class="level3">
<h3 class="anchored" data-anchor-id="test-the-building-blocks">Test the building blocks</h3>
<p>Write unit tests and integration tests for the core building blocks — same as you would’ve done before LLMs.</p>
<hr>
</section>
</section>
<section id="closing" class="level2">
<h2 class="anchored" data-anchor-id="closing">Closing</h2>
<p>This isn’t about distrusting models, it’s about giving them good building blocks to use.</p>
<p>Use GenAI to translate intent. Use the building blocks to execute. Keep errors where you can measure them.</p>
<p>That is how automation becomes something you can trust instead of manually testing once and being hopeful.</p>


</section>

 ]]></description>
  <category>engineering</category>
  <category>genai</category>
  <category>evals</category>
  <guid>https://alexhans.github.io/posts/series/evals/error-compounding-genai-systems-approach.html</guid>
  <pubDate>Wed, 28 Jan 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Stop Reformatting Markdown When Pasting into Slack</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/slack/stop-reformatting-markdown-when-pasting-into-slack.html</link>
  <description><![CDATA[ 






<section id="pain-point" class="level1">
<h1>Pain Point</h1>
<p>Slack only pastes rich formatting when the clipboard advertises <code>text/html</code>, otherwise it treats everything as plain text.</p>
<p>If my file <code>sample.md</code> looks like this:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb1-1"><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:robot_face:</span> Tech updates <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:robot_face:</span></span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;"># Some Title</span></span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>**Project**: Did x in <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">google</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">](https://google.com)</span>.</span>
<span id="cb1-6"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>aaa</span>
<span id="cb1-7"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>bbb</span>
<span id="cb1-8"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">      - </span>ccc </span>
<span id="cb1-9"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>ddd</span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;"># Another title</span></span>
<span id="cb1-12"></span>
<span id="cb1-13"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Another launch</span>
<span id="cb1-14"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">  - </span>details</span></code></pre></div></div>
<p>Compare pasting directly on the left and what we want on the right.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://alexhans.github.io/posts/slack/markdown-vs-slack-rich-paste.png" class="img-fluid figure-img"></p>
<figcaption>Side by Side comparison between markdown in plain text and the slack rich formatted version</figcaption>
</figure>
</div>
</section>
<section id="solution" class="level1">
<h1>Solution</h1>
<p>Put HTML onto the clipboard the same way a browser would, so Slack pastes it as rich content instead of plain text.</p>
<p>This requires a recent xclip build that supports advertising <code>text/html</code> correctly.</p>
<ol type="1">
<li>Build <a href="https://github.com/astrand/xclip">xclip from source</a> to get the latest features around html</li>
<li><code>pip install beautifulsoup4 lxml</code></li>
<li>Run <code>pandoc sample.md -t html</code></li>
<li>Optionally modify the HTML to fix things like lists.</li>
<li>and pipe it to <code>xclip -selection clipboard -t text/html</code></li>
</ol>
<p>What it looks like:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">pandoc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>f gfm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>t html sample.md <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> python <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>c <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span></span>
<span id="cb2-3"><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">import sys</span></span>
<span id="cb2-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> bs4 <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> BeautifulSoup <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> BS, Tag</span>
<span id="cb2-5"></span>
<span id="cb2-6">s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BS(sys.stdin.read(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lxml"</span>)</span>
<span id="cb2-7"></span>
<span id="cb2-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> inline_html(tag: Tag) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>:</span>
<span id="cb2-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>.join(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tag.contents).strip()</span>
<span id="cb2-10"></span>
<span id="cb2-11">out_lines <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb2-12"></span>
<span id="cb2-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> emit_block(tag: Tag):</span>
<span id="cb2-14">    name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tag.name.lower()</span>
<span id="cb2-15"></span>
<span id="cb2-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> name <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"h1"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"h2"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"h3"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"h4"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"h5"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"h6"</span>):</span>
<span id="cb2-17">        txt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tag.get_text(strip<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb2-18">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> txt:</span>
<span id="cb2-19">            out_lines.append(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"&lt;strong&gt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>txt<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&lt;/strong&gt;"</span>)</span>
<span id="cb2-20">            out_lines.append(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;br/&gt;"</span>)</span>
<span id="cb2-21">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span></span>
<span id="cb2-22"></span>
<span id="cb2-23">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> name <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"p"</span>,):</span>
<span id="cb2-24">        txt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> inline_html(tag)</span>
<span id="cb2-25">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> txt:</span>
<span id="cb2-26">            out_lines.append(txt)</span>
<span id="cb2-27">            out_lines.append(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;br/&gt;"</span>)</span>
<span id="cb2-28">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span></span>
<span id="cb2-29"></span>
<span id="cb2-30">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> name <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ul"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ol"</span>):</span>
<span id="cb2-31">        walk_list(tag, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb2-32">        out_lines.append(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;br/&gt;"</span>)</span>
<span id="cb2-33">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span></span>
<span id="cb2-34"></span>
<span id="cb2-35"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> walk_list(lst: Tag, level: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>):</span>
<span id="cb2-36">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> li <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> lst.find_all(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"li"</span>, recursive<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb2-37">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># separate nested lists</span></span>
<span id="cb2-38">        nested <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [c <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> li.find_all([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ul"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ol"</span>], recursive<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)]</span>
<span id="cb2-39">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> n <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> nested:</span>
<span id="cb2-40">            n.extract()</span>
<span id="cb2-41"></span>
<span id="cb2-42">        text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> inline_html(li)</span>
<span id="cb2-43">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> text:</span>
<span id="cb2-44">            indent <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&amp;nbsp;"</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> level)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 4 = indent width per level</span></span>
<span id="cb2-45">            bullet <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&amp;#8226;"</span>               <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># •</span></span>
<span id="cb2-46">            out_lines.append(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>indent<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}{</span>bullet<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>text<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&lt;br/&gt;"</span>)</span>
<span id="cb2-47"></span>
<span id="cb2-48">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> n <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> nested:</span>
<span id="cb2-49">            walk_list(n, level <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb2-50"></span>
<span id="cb2-51">body <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> s.body <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> s.body <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> s</span>
<span id="cb2-52"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> child <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(body.children):</span>
<span id="cb2-53">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(child, Tag):</span>
<span id="cb2-54">        emit_block(child)</span>
<span id="cb2-55"></span>
<span id="cb2-56"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>.join(out_lines), end<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span>
<span id="cb2-57"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">' </span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-58"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">| xclip -sel clipboard -t text/html -alt-text "Updates"</span></span></code></pre></div></div>
<p><strong>Note:</strong> The Python step is only needed if you want to fix Slack’s broken handling of nested lists. For simple formatting (bold, links, headings), Pandoc -&gt; xclip alone is enough.</p>
</section>
<section id="inspiration" class="level1">
<h1>Inspiration</h1>
<p><a href="https://www.jvt.me/posts/2025/04/19/slack-external-markdown/">Authoring Markdown externally and pasting the ‘pretty’ output into Slack (on Linux)</a> does the same thing without the extra formatting to fix the nested lists.</p>
</section>
<section id="annex" class="level1">
<h1>Annex</h1>
<section id="how-to-build-latest-xclip-from-source-in-ubuntu" class="level2">
<h2 class="anchored" data-anchor-id="how-to-build-latest-xclip-from-source-in-ubuntu">How to build latest xclip from source in Ubuntu</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt install autoconf automake libtool libxmu-dev</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> clone https://github.com/astrand/xclip</span>
<span id="cb3-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> xclip</span>
<span id="cb3-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">autoreconf</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-fi</span></span>
<span id="cb3-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">./configure</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--prefix</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>/usr/local</span>
<span id="cb3-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make</span></span>
<span id="cb3-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> make install</span></code></pre></div></div>
</section>
<section id="wsl-and-powershell-are-different-beasts" class="level2">
<h2 class="anchored" data-anchor-id="wsl-and-powershell-are-different-beasts">WSL and powershell are different beasts</h2>
<p>This won’t work on WSL and slack as-is. You likely need to do it from powershell using a third-party program</p>
<p>WSL cannot directly populate the Windows clipboard with rich HTML in a way Slack accepts; an intermediate Windows application re-copies the content with additional clipboard formats.</p>
<ol type="1">
<li>Powershell 5</li>
</ol>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode powershell code-with-copy"><code class="sourceCode powershell"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Get-Content</span> out<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">html</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>Raw <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Set-Clipboard</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>AsHtml</span></code></pre></div></div>
<ol start="2" type="1">
<li>Open LibreOffice Writer or any GUI and paste.</li>
<li>Select that and copy</li>
<li>Paste into slack</li>
</ol>
<ul>
<li><a href="https://github.com/PowerShell/PowerShell/issues/18196">Format HTML won’t be added to Powershell 7</a></li>
<li><a href="https://www.reddit.com/r/PowerShell/comments/dq8mr3/setclipboard_for_powershell_7/">People are not happy</a></li>
</ul>


</section>
</section>

 ]]></description>
  <category>productivity</category>
  <guid>https://alexhans.github.io/posts/slack/stop-reformatting-markdown-when-pasting-into-slack.html</guid>
  <pubDate>Fri, 16 Jan 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Fix: pip hangs in WSL (IPv6 / gai.conf)</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/wsl-pip-hangs-ipv6.html</link>
  <description><![CDATA[ 






<section id="pain-point" class="level2">
<h2 class="anchored" data-anchor-id="pain-point">Pain Point</h2>
<p><code>pip install</code> hangs in WSL with no useful error, often after it starts fetching from <code>files.pythonhosted.org</code>.</p>
</section>
<section id="the-rule" class="level2">
<h2 class="anchored" data-anchor-id="the-rule">The Rule</h2>
<p>If DNS/connection to <code>files.pythonhosted.org</code> hangs but <code>pypi.org</code> works, suspect IPv6 preference + broken IPv6 routing.</p>
</section>
<section id="minimal-diagnosis" class="level2">
<h2 class="anchored" data-anchor-id="minimal-diagnosis">Minimal Diagnosis</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">python</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-c</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"import urllib.request; print(urllib.request.urlopen('https://pypi.org/simple/').status)"</span></span>
<span id="cb1-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># expected: 200</span></span></code></pre></div></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">getent</span> hosts pypi.org</span>
<span id="cb2-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># returns quickly</span></span></code></pre></div></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">getent</span> hosts files.pythonhosted.org</span>
<span id="cb3-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># may hang</span></span></code></pre></div></div>
<p>If <code>files.pythonhosted.org</code> hangs, <code>pip</code> will hang. That host is where wheels and sdists are served from.</p>
</section>
<section id="fix" class="level2">
<h2 class="anchored" data-anchor-id="fix">Fix</h2>
<p>Prefer IPv4 for address selection using <code>gai.conf</code>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> tee /etc/gai.conf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>/dev/null <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;'EOF'</span></span>
<span id="cb4-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">precedence ::ffff:0:0/96  100</span></span>
<span id="cb4-3"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">EOF</span></span></code></pre></div></div>
<p>This does not disable IPv6. It changes the precedence so IPv4 is tried first.</p>
</section>
<section id="verify" class="level2">
<h2 class="anchored" data-anchor-id="verify">Verify</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">getent</span> hosts files.pythonhosted.org</span>
<span id="cb5-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># should return immediately</span></span></code></pre></div></div>
<p>Then retry:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb6-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pip</span> install ipython</span></code></pre></div></div>
</section>
<section id="revert" class="level2">
<h2 class="anchored" data-anchor-id="revert">Revert</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> tee /etc/gai.conf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>/dev/null <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;'EOF'</span></span>
<span id="cb7-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"># empty override: use glibc defaults</span></span>
<span id="cb7-3"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">EOF</span></span></code></pre></div></div>
</section>
<section id="notes" class="level2">
<h2 class="anchored" data-anchor-id="notes">Notes</h2>
<p>If you see multiple stuck installs, clear them before retrying:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb8-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">pkill</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-f</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"python -u -m pip install"</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">||</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">true</span></span></code></pre></div></div>
</section>
<section id="other-sources" class="level1">
<h1>Other sources</h1>
<ul>
<li><a href="https://askubuntu.com/questions/958876/how-to-disable-ipv6-on-windows-subsystem-for-linux">https://askubuntu.com/questions/958876/how-to-disable-ipv6-on-windows-subsystem-for-linux</a></li>
<li><a href="https://man7.org/linux/man-pages/man5/gai.conf.5.html">gai.conf(5) — Linux manual page</a></li>
</ul>


</section>

 ]]></description>
  <category>troubleshooting</category>
  <guid>https://alexhans.github.io/posts/wsl-pip-hangs-ipv6.html</guid>
  <pubDate>Tue, 13 Jan 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Talks: Toward a Shared Vision for LLM Evaluation in the Airflow Ecosystem</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/talk.toward-a-shared-vision-of-llm-evals-in-airflow-ecosystem.html</link>
  <description><![CDATA[ 






<section id="abstract" class="level1">
<h1>Abstract</h1>
<p>As LLM tools and agents emerge in the Airflow community, whether as plugins, MCP servers, or embedded agents, we lack a consistent way to benchmark across implementations and across versions of the same solution. This lightning talk highlights the need of an agreed-upon evaluation mechanism that enables us to measure, compare, and reproduce results when working with GenAI solutions in relation to Airflow. I’ll share what such mechanism could look like in practice. If you care about building trustworthy, testable GenAI systems (that could eventually fit into CI/CD workflows) and want to able to have grounded discussions when developing in this space, let’s lay the groundwork to test and compare our tools meaningfully.</p>
</section>
<section id="slides-and-transcript" class="level1">
<h1>Slides and Transcript</h1>
<ul>
<li><a href="../talks/airflow-summit/toward-a-shared-vision-of-llm-evals-in-airflow-ecosystem.html">Toward a Shared Vision for LLM Evaluation in the Airflow Ecosystem</a></li>
<li><a href="../talks/airflow-summit/toward-a-shared-vision-of-llm-evals-in-airflow-ecosystem.vtt">Transcript (WebVTT)</a></li>
</ul>


</section>

 ]]></description>
  <category>airflow</category>
  <category>talks</category>
  <guid>https://alexhans.github.io/posts/talk.toward-a-shared-vision-of-llm-evals-in-airflow-ecosystem.html</guid>
  <pubDate>Mon, 08 Sep 2025 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Using Data Build Tool (dbt) to Accelerate &amp; Scale Science</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/using-dbt-to-accelerate-science.html</link>
  <description><![CDATA[ 






<p>This post is part of a series: “Factory of Domain Experts”:</p>
<section id="what-problem-are-we-solving" class="level2">
<h2 class="anchored" data-anchor-id="what-problem-are-we-solving">What problem are we solving?</h2>
<p>“When can we launch this?” is a recurring question in cross-functional teams, and the answer is often “ask the engineers”. But is that really necessary? Do scientists need to build and then hand off to engineers to rewrite code for scalability or reliability?</p>
<p>I challenged this pattern since I wanted to scale without having to grow engineering headcount, and empower our scientists to deliver more impact independently. The original handover approach introduced delays, estimation misses, integration surprises, and iteration overhead.</p>
<section id="what-we-wanted-to-achieve" class="level3">
<h3 class="anchored" data-anchor-id="what-we-wanted-to-achieve">What we wanted to achieve:</h3>
<ol type="1">
<li><strong>Scale our team</strong>: Grow scientific output independent of engineering capacity.</li>
<li><strong>Iterate in parallel, not sequentially</strong>: Team members collaborate building and integrating simultaneously, without waiting periods or handovers.</li>
<li><strong>Share easily reproducible code</strong>: Produce reproducible code and data that makes cross-team collaboration easy and transparent.</li>
</ol>
<p>Instead of building custom solutions, we used a small set of industry-standard tools, <a href="https://github.com/dbt-labs/dbt-core?tab=readme-ov-file#understanding-dbt">dbt (Data Build Tool)</a> with SQL, <a href="https://airflow.apache.org/">Apache Airflow</a> (using <a href="https://github.com/astronomer/astronomer-cosmos">astronomer-cosmos</a>), and <a href="https://git-scm.com/">Git</a>, to create a simple system. Scientists now develop close to the domain, and their work is automatically orchestrated and deployed without engineers needing to rewrite or manage the code. There’s no custom Graphical User Interface (GUI) or platform, just clear conventions, smart defaults, and infrastructure-as-code. Engineers focus on building reusable capabilities while scientists focus on science and business logic.</p>
</section>
</section>
<section id="how-dbt-solves-these-problems" class="level2">
<h2 class="anchored" data-anchor-id="how-dbt-solves-these-problems">How dbt solves these problems</h2>
<p>Data Build Tool (dbt) enables engineers and scientists alike to transform data using software engineering best practices. <strong>Crucially, there are no tradeoffs between scrappy exploration and production-ready code</strong>, the same code serves both purposes:</p>
<ul>
<li><strong>Production-ready from day one</strong>: The code scientists write IS the production code. No handovers, no rewrites, no “let me translate this for production.” Your development SQL becomes the scheduled pipeline automatically.</li>
<li><strong>Collaboration and early integration</strong>: Since both engineers and scientists can run the same dbt code, collaboration happens naturally from day one, fostering cross-domain learning and surfacing integration or reproducibility issues early, reducing project risk.</li>
<li><strong>Simple workflows that scale</strong>: A simple <code>dbt run -s "model_name+"</code> runs your model and all dependencies. The same code that works for individual data exploration works for production scheduling.</li>
<li><strong>Modularity without orchestration headaches</strong>: dbt forces you to break apart monolithic SQL into focused models, but handles all the dependency management automatically, so you get the benefits of clean, debuggable code without the cognitive overhead of managing execution order.</li>
<li><strong>Automatic lineage and documentation</strong>: dbt generates interactive dependency graphs showing how your models connect. <a href="https://docs.getdbt.com/reference/commands/cmd-docs">Schema documentation</a> automatically appears in the warehouse tables.</li>
<li><strong>Built-in quality controls</strong>: Define data tests that run automatically.</li>
<li><strong>Built for integration and extensibility</strong>: dbt integrates seamlessly with our existing AWS stack (Athena, Glue, Iceberg), internal services and datalakes and industry standard tools.</li>
<li><strong>Compliance and governance</strong>: Data policies can be built into packages, ensuring compliance and empowering your users to make the right tradeoffs around data handling.</li>
</ul>
</section>
<section id="impact" class="level2">
<h2 class="anchored" data-anchor-id="impact">Impact</h2>
<p>Our approach enabled delivery of multiple high-impact scientist-led projects that would have otherwise been delayed or blocked due to engineering constraints. Peer teams adopted or expressed desires to adopt it, when they had a chance to work with us and experiment the productivity speed ups, in different dimensions.</p>


</section>

 ]]></description>
  <category>data-science</category>
  <category>engineering</category>
  <guid>https://alexhans.github.io/posts/using-dbt-to-accelerate-science.html</guid>
  <pubDate>Sun, 31 Aug 2025 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Use aider for free with your local LLMs or cheaply with OpenRouter</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/aider-with-open-router.html</link>
  <description><![CDATA[ 






<p>Many people use LLM (Large Language Models) services to code at work but don’t necessarily see a path to use them at home on a budget.</p>
<p>Here are two quick recipes: one for a fully local, privacy-focused setup, and another using OpenRouter.</p>
<section id="local-llms" class="level2">
<h2 class="anchored" data-anchor-id="local-llms">Local LLMs</h2>
<ol type="1">
<li>Make sure you have <a href="https://github.com/ollama/ollama">ollama</a> installed and running.</li>
<li>Note down a wich model(s) you have installed and plan to use. We’ll use <a href="https://ollama.com/library/deepseek-r1">deepseek-r1</a> and <a href="https://ollama.com/library/qwen2.5-coder">qwen2.5-coder</a> as example models. <code>Deepseek</code> is general purpose and a good candidate for reasoning while <code>qwen2.5-coder</code> is specialized for coding tasks.</li>
</ol>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">$</span> ollama list</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">NAME</span>                                        ID              SIZE      MODIFIED</span>
<span id="cb1-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">deepseek-r1:14b</span>                             ea35dfe18182    9.0 GB    2 hours ago</span>
<span id="cb1-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">qwen2.5-coder:14b</span>                           9ec8897f747e    9.0 GB    2 hours ago</span></code></pre></div></div>
<p>I’m using the 14-B distilled models based on my hardware. You can experiment with different ones and find what speed vs quality tradeoff you’re comfortable with. The <a href="https://ollama.com/search">Ollama models site</a> is very handy to get information about models and their distilled versions.</p>
<ol start="3" type="1">
<li><a href="https://aider.chat/docs/llms/ollama.html">follow the guide</a> which tells you to run:</li>
</ol>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aider</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--model</span> ollama_chat/<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>model<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span></code></pre></div></div>
<p>So in our case, that becomes:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aider</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--model</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ollama_chat/deepseek-r1:14b"</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--editor-model</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ollama_chat/qwen2.5-coder:14b"</span></span></code></pre></div></div>
<p>We could simply use one model for everything but this “plan vs execution” pattern works really well both locally and remotely.</p>
<p>Use <code>aider --help</code> or visit <a href="https://aider.chat/docs/config/options.html">the options page on aider’s site</a> to understand the differences between <code>--model</code> (main model), <code>--editor-model</code> (editor tasks), and <code>--weak-model</code> (commit messages and history summarization).</p>
</section>
<section id="cheaply-with-openrouter" class="level2">
<h2 class="anchored" data-anchor-id="cheaply-with-openrouter">Cheaply with OpenRouter</h2>
<p>If you’re not satisfied with using your hardware for everything and are ok with sending data to an LLM in the cloud, you can use OpenRouter.</p>
<p>The advantage of using OpenRouter over a specific LLM service like <a href="https://www.anthropic.com/api">Claude</a>, <a href="https://openai.com/index/openai-api/">ChatGPT API</a> or others is that you can have a cloud independent approach and mix and match APIs paying in only one place, while also setting specific budgets that you can’t go over.</p>
<p><a href="https://www.reddit.com/r/LocalLLaMA/comments/1jhjbgj/best_llm_for_code_through_api_with_aider/">user u/Baldur-Norddahl Reddit LocalLLama</a> shared a snippet of what it looks like. You’ll notice it’s very similar to our local example with the addition of the OpenRouter API Key as an environment variable and that we use Claude 3.7 and the full version of Deepseek r1:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">export</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">OPENROUTER_API_KEY</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>sk-or-v1-xxxx</span>
<span id="cb4-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">aider</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--architect</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--model</span> openrouter/deepseek/deepseek-r1 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--editor-model</span> openrouter/anthropic/claude-3.7-sonnet <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--watch-files</span></span></code></pre></div></div>
<p>You can easily monitor your <a href="https://openrouter.ai/activity">activity</a> an estimate what your coding sessions are actually like. This may lead you to switch from Claude 3.7 to something cheaper. Again, it’s all about personal experience and quality tradeoffs.</p>
</section>
<section id="in-closing" class="level2">
<h2 class="anchored" data-anchor-id="in-closing">In Closing</h2>
<p>Both patterns are very useful and allow you a great degree of flexibility. There’s a lot of power in customization and avoiding vendor lock-in. You’ll be able to experiement with cline/aider or whatever the next tool is. As hardware becomes more powerful, you could have a very productive experience on a plane, even without internet access.</p>
<p>Shoutout to Georgi Gerganov’s <a href="https://picovoice.ai/blog/local-llms-llamacpp-ollama/">llama.cpp</a> which is the core that allows ollama to work.</p>


</section>

 ]]></description>
  <category>genai</category>
  <category>code</category>
  <guid>https://alexhans.github.io/posts/aider-with-open-router.html</guid>
  <pubDate>Sat, 05 Jul 2025 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Merge and Forget</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/series/zeroops/merge-and-forget.html</link>
  <description><![CDATA[ 






<!-- Merge and forget: Engineers don't have to follow deployments through to feel assured and can simply expect things to work, and context switch right after code merges happen. -->
<section id="the-rule" class="level2">
<h2 class="anchored" data-anchor-id="the-rule">The Rule</h2>
<p>After your change is approved and enters the delivery pipeline, you should be able to <strong>forget it</strong>.</p>
<p>No following it through pipelines. No watching for when it lands. No manual checking for failure states.</p>
</section>
<section id="pain-point" class="level2">
<h2 class="anchored" data-anchor-id="pain-point">Pain Point</h2>
<p>Tracking deployments “just in case” creates unnecessary <strong>cognitive load</strong>.</p>
<p>It turns delivery into a background worry: tabs left open, dashboards checked, attention fragmented.</p>
<p>If something requires attention, you should be told.</p>
</section>
<section id="do" class="level2">
<h2 class="anchored" data-anchor-id="do">Do</h2>
<p>Treat delivery as a <strong>system property</strong>:</p>
<ul>
<li>Push “surprises” left: run fast, automated cross-system checks (e.g.&nbsp;integration tests) <strong>before</strong> merge, at code-review time, to minimise post-merge failures.</li>
<li>You should get a signal if something is blocked or broken.</li>
<li>You should not need manual reassurance.</li>
<li>When the system is healthy, silence is expected.</li>
</ul>
<p>(see <a href="../../../posts/series/zeroops/no-news-is-good-news">No News Is Good News</a>).</p>
</section>
<section id="do-not" class="level2">
<h2 class="anchored" data-anchor-id="do-not">Do Not</h2>
<ul>
<li>Follow a deploy through the pipeline to feel safe.</li>
<li>Keep checking “did it land yet?”</li>
<li>Watch logs/dashboards after merge for reassurance.</li>
</ul>
</section>
<section id="scope" class="level2">
<h2 class="anchored" data-anchor-id="scope">Scope</h2>
<p>This is for routine, continuous delivery of typical changes.</p>
<p>This does <strong>not</strong> cover:</p>
<ul>
<li>major migrations</li>
<li>one-way / high-blast-radius changes</li>
<li>communicating expected delivery times (ETAs)</li>
</ul>


</section>

 ]]></description>
  <category>engineering</category>
  <guid>https://alexhans.github.io/posts/series/zeroops/merge-and-forget.html</guid>
  <pubDate>Fri, 17 May 2024 23:00:00 GMT</pubDate>
</item>
<item>
  <title>No News Is Good News</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/series/zeroops/no-news-is-good-news.html</link>
  <description><![CDATA[ 






<section id="the-rule" class="level2">
<h2 class="anchored" data-anchor-id="the-rule">The Rule</h2>
<p>Do <strong>not</strong> check whether things are fine.</p>
</section>
<section id="pain-point" class="level2">
<h2 class="anchored" data-anchor-id="pain-point">Pain Point</h2>
<p>Manually checking systems to confirm they are “okay” creates unnecessary <strong>cognitive load</strong>.</p>
<p>Dashboards and queues become reassurance rituals: they consume time and attention without changing outcomes.</p>
<p>If something is broken, you should be told.</p>
</section>
<section id="do" class="level2">
<h2 class="anchored" data-anchor-id="do">Do</h2>
<p>For anything that might require action, ensure there is an automated <strong>signal</strong>.</p>
<ul>
<li>The signal should reach the people who can meaningfully act on it.</li>
<li>It does not need to prescribe the action.</li>
<li>Over time, actions may be formalised (runbooks, automation), but that is secondary.</li>
</ul>
<p>If something requires attention, it should create noise. If it does not, it should remain silent.</p>
</section>
<section id="do-not" class="level2">
<h2 class="anchored" data-anchor-id="do-not">Do Not</h2>
<ul>
<li>Create mechanisms to confirm system health.</li>
<li>Regularly inspect dashboards “just to be sure”.</li>
<li>Rely on manual checks for reassurance.</li>
</ul>
<p>Silence is expected when coverage is adequate.</p>
</section>
<section id="scope" class="level2">
<h2 class="anchored" data-anchor-id="scope">Scope</h2>
<p>This applies to <strong>operational, actionable failures</strong>: things that are broken <em>now</em> and require attention.</p>
<p>This does <strong>not</strong> cover:</p>
<ul>
<li>slow degradation</li>
<li>trend monitoring</li>
<li>preemptive or exploratory analysis</li>
</ul>
</section>
<section id="analogy" class="level2">
<h2 class="anchored" data-anchor-id="analogy">Analogy</h2>
<p>Think of a perfect assistant: they interrupt you only when there is something you can act on. If they do not interrupt you, you can assume everything is fine.</p>


</section>

 ]]></description>
  <category>engineering</category>
  <guid>https://alexhans.github.io/posts/series/zeroops/no-news-is-good-news.html</guid>
  <pubDate>Thu, 16 May 2024 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Set a Meeting Budget</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/meeting-budget.html</link>
  <description><![CDATA[ 






<section id="pain-point" class="level1">
<h1>Pain Point</h1>
<!-- A huge productivity damager is having an excessive amount of meetings disrupting your week -->
<p>Too many recurring meetings drain your week’s productivity.</p>
</section>
<section id="the-rule" class="level1">
<h1>The Rule</h1>
<ul>
<li>Set a <strong>hard budget</strong> for fixed meetings.</li>
<li>Example: 40 h week, 6 h meeting budget.</li>
</ul>
<pre><code>Total - Budget = Free -&gt; 40 - 6 = 34</code></pre>
<ul>
<li>If you go over budget, <strong>cut or shrink</strong> the least important meetings.</li>
<li>You <em>can</em> adjust the budget, but do so rarely, otherwise it loses meaning.</li>
<li>Ad-hoc syncs are fine. It’s the <strong>recurring</strong> ones that eat up your time.
<ul>
<li>Consider doing a similar thing for ad-hoc meetings, if they become a problem.</li>
</ul></li>
<li>Like code, less is better. Always look for ways to reduce, even if you’re under budget.</li>
</ul>
</section>
<section id="analogy" class="level1">
<h1>Analogy</h1>
<p>This is similar to the U.S. Senate’s <a href="https://www.congress.gov/crs-product/RL31943">PAYGO</a> rule:</p>
<blockquote class="blockquote">
<p>if Congress wants to add $N to a program, they must “find room” by reducing $N somewhere else or by increasing taxes to cover it.</p>
</blockquote>


</section>

 ]]></description>
  <category>productivity</category>
  <guid>https://alexhans.github.io/posts/meeting-budget.html</guid>
  <pubDate>Wed, 01 Mar 2023 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Get notifications in ubuntu when command line tasks end</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/notifications-for-command-line-tasks.html</link>
  <description><![CDATA[ 






<!-- Title: Get notifications in ubuntu when command line tasks end
Date: 2019-04-16 00:00
Modified: 2019-04-16 00:00
Category: Devops
Tags: notification, producitivity, alert, ubuntu, linux, desktop, mail
slug: notifications_in_ubuntu_linux
Authors: Alex Hans
Summary: Notifications in Ubuntu Linux for command line tasks
Lang: en -->
<section id="intro" class="level2">
<h2 class="anchored" data-anchor-id="intro">Intro</h2>
<p>Often, when working in the terminal, you’ll find yourself running a command that takes a non-trivial amount of time and you don’t want to just stare at the screen until it finishes.</p>
<p>So you switch tabs/windows and do something else in the meantime. Problem is, when is the other task finished? You don’t want to waste time checking too often nor too late…</p>
<p>So what you want is a notification. One that lets you carry on merrily until the original command is actually finished.</p>
<p>It turns out that many <a href="https://askubuntu.com/questions/17536/how-do-i-create-a-permanent-bash-alias">.bashrc</a> files come with an alias called <code>alert</code> and, some SO answers even improve upon it.</p>
</section>
<section id="desktop-notifications-with-notify-send" class="level2">
<h2 class="anchored" data-anchor-id="desktop-notifications-with-notify-send">Desktop notifications with notify-send</h2>
<p>Here’s the one I’m using lately and has served me well:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add an "alert" alias for long running commands.  Use like so:</span></span>
<span id="cb1-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   sleep 10; alert</span></span>
<span id="cb1-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">alias</span> alert=<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'notify-send --urgency=low -i "$([ $? = 0 ] &amp;&amp; echo terminal || echo error)"  </span></span>
<span id="cb1-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"$(history|tail -n1|sed -e '</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\'</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'s/^\s*[0-9]\+\s*//;s/[;&amp;|]\s*alert$//'</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\'</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">')"'</span></span></code></pre></div></div>
<p>As the comment says, using it is just a matter of writing the command you want, a semi-colon and the alias <code>alert</code> (Remember that semi-colon <code>;</code> means execute after the previous command is finished, no matter the return code, unlike <code>&amp;&amp;</code> which only executes the next command if return code is 0 (success).</p>
<p>So if you’re compiling and running tests in a project you could just do:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make</span> test<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">alert</span></span></code></pre></div></div>
<p>and you’ll get notified whenever <code>make test</code> ends.</p>
<p>But what if you decided running that lengthy task is a good moment to step away from your computer and take a coffee break or talk with a coworker? How will you know when it’s done if you’re not in front of the computer to see the desktop notification?</p>
</section>
<section id="email-notifications" class="level2">
<h2 class="anchored" data-anchor-id="email-notifications">Email notifications</h2>
<p>That’s when email comes in handy. You just gotta take your phone with you and have access to an SMTP server.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">$</span> sudo apt install mailutils</span></code></pre></div></div>
<p>The logic is the same as before, once the command is done, execute the “alert”.</p>
<p>If you want to do it in python, here’s a <a href="https://unix.stackexchange.com/a/55437/7937">simple way</a> to go about it.</p>
<p>Just make sure it doesn’t go to SPAM.</p>
<p>Cheers</p>
<hr>
<p><strong>Was this helpful? Do you do it another way? All comments are welcome!</strong></p>


</section>

 ]]></description>
  <category>devops</category>
  <guid>https://alexhans.github.io/posts/notifications-for-command-line-tasks.html</guid>
  <pubDate>Mon, 15 Apr 2019 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Accept a self-signed certificate with git</title>
  <dc:creator>Alex Guglielmone Nemi</dc:creator>
  <link>https://alexhans.github.io/posts/accept-self-signed-cert-git-https.html</link>
  <description><![CDATA[ 






<!-- Title: 
Date: 2018-02-11 20:54
Modified: 2018-02-27 00:00
Category: Devops
Tags: https, git, networking, sysadmin, svn, devops
slug: accept_self_signed_cert_git
Authors: Alex Hans
Summary: How To accept a self-signed cert in git over HTTPS
Lang: en -->
<section id="intro" class="level2">
<h2 class="anchored" data-anchor-id="intro">Intro</h2>
<p>Some time ago I came into an issue where people served git repositories in a local network using apache but used a self-signed certificate for the server.</p>
<p>Everyone was already trained to add the exception in their browsers to access HTML content but what happened when it came to source code control?</p>
</section>
<section id="the-problem" class="level2">
<h2 class="anchored" data-anchor-id="the-problem">The Problem</h2>
<p>It turns out Subversion (SVN) presented no issue since it prompted the user to accept the new server key just once and then didn’t pester them again but git was another story. Git tried to verify that the cert was signed by a proper authority and couldn’t.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">user@user-linux:git$</span> git clone https://user@dev-server-01/git/repo_name.git </span>
<span id="cb1-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Cloning</span> into <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'repo_name'</span>...</span>
<span id="cb1-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">fatal:</span> unable to access <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://user@dev-server-01/git/repo_name.git/'</span>: server certificate verification failed. </span>
<span id="cb1-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">CAfile:</span> /etc/ssl/certs/ca-certificates.crt CRLfile: none</span></code></pre></div></div>
</section>
<section id="the-solution" class="level2">
<h2 class="anchored" data-anchor-id="the-solution">The Solution</h2>
<p>After some googling I came across suggestions to disable SSL verification with <code>git config http.sslVerify "false"</code> but that looked like it could induce some bad habits and it actually wouldn’t prevent tampering if, for instance, the user was pointed elsewhere instead of the proper original server.</p>
<p>That’s when <a href="https://stackoverflow.com/questions/11621768/how-can-i-make-git-accept-a-self-signed-certificate/26785963#26785963">Stack</a> <a href="https://stackoverflow.com/questions/23807313/adding-self-signed-ssl-certificate-without-disabling-authority-signed-ones">Overflow</a> came into play and I found about this neat solution where you can associate a hostname with a given certificate that you store locally.</p>
<p>Steps:</p>
<p>1- Download the self signed certificate from the server and store it somewhere like /etc/ssl/certs</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">/etc/ssl/certs/ssl-cert-dev-01.pem</span></span>
<span id="cb2-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">/etc/ssl/certs/ssl-cert-dev-02.pem</span></span></code></pre></div></div>
<p>2- Modify your git config (globally or per-repository) to associate hosts with certs:</p>
<pre><code>(From git config --help)

http.sslCAInfo
    File containing the certificates to verify the peer with when fetching or pushing over HTTPS. 
    Can be overridden by the GIT_SSL_CAINFO environment variable.</code></pre>
<p>In this case we’re going to do it globally by modifying <code>~/.gitconfig</code></p>
<pre><code>[http "https://dev-server-01:/"]
    sslCAInfo = /etc/ssl/certs/ssl-cert-dev-01.pem

[http "https://dev-server-02"]
    sslCAInfo = /etc/ssl/certs/ssl-cert-dev-02.pem</code></pre>
<p>Or you can do it with the command line:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb5-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">$</span> git config <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--global</span> http.<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://dev-server-01/"</span>.sslCAInfo /etc/ssl/certs/ssl-cert-dev-01.pem</span>
<span id="cb5-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">$</span> git config <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--global</span> http.<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://dev-server-02/"</span>.sslCAInfo /etc/ssl/certs/ssl-cert-dev-02.pem</span></code></pre></div></div>
<p>Of course, this breaks the flow of those who were using HTTP and the IP address directly since <a href="https://stackoverflow.com/questions/35604640/why-does-validation-fail-when-connecting-to-a-server-by-ip-address-instead-of-ho">you need the same name that appears in the certificate</a>. That’s the one con I can think of and, if your users where not in the habit of doing so, you’ll better start getting them used to it.</p>
<p>Cheers</p>
<hr>
<p><strong>Was this helpful? Do you do it another way? All comments are welcome!</strong></p>


</section>

 ]]></description>
  <category>devops</category>
  <guid>https://alexhans.github.io/posts/accept-self-signed-cert-git-https.html</guid>
  <pubDate>Sun, 11 Feb 2018 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
