<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Blog on Pavel Anni</title>
    <link>https://pavelanni.dev/blog/</link>
    <description>Recent content in Blog on Pavel Anni</description>
    <generator>Hugo -- 0.146.0</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 04 Nov 2025 17:47:54 -0500</lastBuildDate>
    <atom:link href="https://pavelanni.dev/blog/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>From hours to minutes: Automating cluster deployments with AIStor MCP server</title>
      <link>https://pavelanni.dev/blog/mcp_server_automating_deployments/</link>
      <pubDate>Tue, 04 Nov 2025 17:47:54 -0500</pubDate>
      <guid>https://pavelanni.dev/blog/mcp_server_automating_deployments/</guid>
      <description>&lt;p&gt;As a MinIO Curriculum Engineer, I spend a lot of time creating demos and learning environments for customers. These environments need to be realistic, comprehensive, and ready to showcase MinIO AIStor&amp;rsquo;s capabilities in real-world scenarios. But here&amp;rsquo;s the thing - setting up a proper demo environment manually can take hours of low-value, repetitive work.&lt;/p&gt;
&lt;p&gt;Let me show you what happened when I decided to automate this process using Claude Code and our AIStor MCP server. If you’ve been following this series, you know that the AIStor MCP server allows us to use natural human language to perform &lt;a href=&#34;https://pavelanni.dev/blog/mcp_server_admin_tools/&#34;&gt;administrative&lt;/a&gt; tasks on our object stores. Instead of running standard command-line tools (and remembering all the arguments and flags), I can simply ask the MCP server, “create a policy” or “create a user” and explain what I want in human language.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>As a MinIO Curriculum Engineer, I spend a lot of time creating demos and learning environments for customers. These environments need to be realistic, comprehensive, and ready to showcase MinIO AIStor&rsquo;s capabilities in real-world scenarios. But here&rsquo;s the thing - setting up a proper demo environment manually can take hours of low-value, repetitive work.</p>
<p>Let me show you what happened when I decided to automate this process using Claude Code and our AIStor MCP server. If you’ve been following this series, you know that the AIStor MCP server allows us to use natural human language to perform <a href="https://pavelanni.dev/blog/mcp_server_admin_tools/">administrative</a> tasks on our object stores. Instead of running standard command-line tools (and remembering all the arguments and flags), I can simply ask the MCP server, “create a policy” or “create a user” and explain what I want in human language.</p>
<p>The same approach works for debugging complex production issues, reproducing environments for testing, or quickly spinning up development environments that mirror your production setup. Let me show you how I used it for demo creation—but keep these other use cases in mind as you read. We’ll discuss them at the end of the post.</p>
<h2 id="the-challenge-building-a-realistic-business-and-technical-scenario">The challenge: Building a realistic business and technical scenario</h2>
<p>I wanted to create a technical demo that showcases AIStor Tables (our implementation of Apache Iceberg Tables) in a way that resonates with enterprise customers. The scenario I had in mind was straightforward:</p>
<p>A typical company with customers, suppliers, and partners, where:</p>
<ul>
<li>Each relationship type gets its own bucket</li>
<li>Documents are organized by type (contracts, meeting notes, reports, analytics)</li>
<li>Different teams have different access levels</li>
<li>Sales teams can write to their customers&rsquo; buckets and read from others</li>
<li>Purchasing has full access to supplier buckets</li>
<li>Partner management controls partner buckets</li>
<li>C-level executives have full access to everything</li>
</ul>
<p>For the demo, I wanted:</p>
<ul>
<li>3 customer companies with 3 sales teams</li>
<li>2 suppliers with a purchasing department</li>
<li>2 partners with a partner management team</li>
<li>C-level team with CEO and CFO</li>
<li>Realistic company names and documents</li>
<li>Proper IAM policies for access control</li>
</ul>
<h2 id="the-traditional-approach-5-hours-of-manual-work">The traditional approach: 5 hours of manual work</h2>
<p>If I were to set this up manually using <code>mc</code> commands, here&rsquo;s what it would involve:</p>
<p><strong>Creating buckets</strong>: about 4 minutes. Simple enough:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mc mb tables/universal-products
</span></span><span style="display:flex;"><span>mc mb tables/global-services
</span></span><span style="display:flex;"><span><span style="color:#75715e"># ... and so on</span>
</span></span></code></pre></div><p><strong>Creating documents</strong>—this is where it gets painful. Around 1.5-2.5 hours:</p>
<ul>
<li>Create each file locally in a text editor or find and download from the internet</li>
<li>Write realistic content (contracts, meeting notes, analytics reports)</li>
<li>Save files with proper paths</li>
<li>Upload each one with <code>mc cp</code></li>
</ul>
<p>For 28 documents ranging from simple CSV files to detailed analytics reports, you&rsquo;re looking at 2-8 minutes per document.</p>
<p><strong>Creating users</strong> - About 12-15 minutes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mc admin user add tables sales-team1-user SalesTeam1SecurePassword2024!
</span></span><span style="display:flex;"><span><span style="color:#75715e"># ... repeat for each user</span>
</span></span></code></pre></div><p><strong>Creating groups and adding members</strong> - Another 12-15 minutes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>mc admin group add tables sales-team-1
</span></span><span style="display:flex;"><span>mc admin group add tables sales-team-1 sales-team1-user
</span></span><span style="display:flex;"><span><span style="color:#75715e"># ... and so on</span>
</span></span></code></pre></div><p><strong>Creating IAM policies</strong>—the really tedious part. 45-90 minutes:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>cat &gt; sales-team-1-rw-policy.json <span style="color:#e6db74">&lt;&lt;EOF
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">{
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  &#34;Version&#34;: &#34;2012-10-17&#34;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  &#34;Statement&#34;: [{
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    &#34;Effect&#34;: &#34;Allow&#34;,
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    &#34;Action&#34;: [&#34;s3:GetObject&#34;, &#34;s3:PutObject&#34;, &#34;s3:DeleteObject&#34;],
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    &#34;Resource&#34;: [&#34;arn:aws:s3:::universal-products/*&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">  }]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">EOF</span>
</span></span><span style="display:flex;"><span>mc admin policy create tables sales-team-1-rw-policy sales-team-1-rw-policy.json
</span></span></code></pre></div><p>Multiply that by the number of required policies, factor in JSON syntax errors, wrong ARNs, debugging time&hellip;</p>
<p><strong>Attaching policies</strong> - 10 minutes of copy-paste-fix-repeat</p>
<p><strong>Verification</strong> - 10-15 minutes, making sure everything is set up correctly</p>
<p><strong>Add context switching and errors</strong> - Another 30-60 minutes for typos, wrong paths, JSON debugging.</p>
<p><strong>Total time: 4.5-6 hours</strong> for an experienced user. First-timers? Easily 6-8 hours.</p>
<p>What if I told you this entire process could be done in 7 minutes?</p>
<h2 id="the-automated-approach-7-minutes">The automated approach: 7 minutes</h2>
<p>Instead of spending half a day on manual setup, I decided to use Claude Code and the AIStor MCP server for that. Following the concept of “<a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/">spec-driven development</a>,” I did it in two steps. First, I asked Claude to create a specification file in Markdown based on my explanation of what I want to achieve. Second, I passed that Markdown file to Claude Code for execution.</p>
<p>Here&rsquo;s what I asked Claude to do using the AIStor MCP Server:</p>
<p>I want to create a demo that showcases Iceberg Tables&rsquo; capabilities using a typical company with customers, suppliers, and partners. Create separate buckets for our customers, suppliers, and partners. In each bucket, collect documents such as contracts, meeting notes, quarterly reports, and analytics. Create sales teams with read-write access to their customer bucket and read-only access to other customer buckets. The purchasing department has full access to all suppliers&rsquo; buckets. The partner department has access to all partners&rsquo; buckets.</p>
<p>Create three customers, three sales teams, two partners, two suppliers, and purchasing and partner teams. Use fake company names, such as Universal Products and Global Services. Create individual buckets for each company. Create sample documents for each bucket in Markdown or CSV formats. Create one user in each team. Create a C-level team with full access to all buckets with the CEO and CFO users.</p>
<p>I added that I want to create a Markdown document to pass to Claude Code for execution via the MCP server. The MCP server was already connected to this instance of Claude Desktop, so it had all the information about available tools and their parameters. As a result, Claude created a comprehensive setup guide document that specified exactly which MCP tools to use for each step.</p>
<p>Claude Code then proceeded to:</p>
<ul>
<li>Connect to the cluster</li>
<li>Create the buckets</li>
<li>Create realistic documents</li>
<li>Create users</li>
<li>Create groups</li>
<li>Create IAM policies</li>
<li>Attach all policies</li>
<li>Verify everything</li>
</ul>
<p>Total: 68 MCP tool calls executed in about 7 minutes.</p>
<h2 id="what-made-this-possible">What made this possible?</h2>
<p>The AIStor MCP server I&rsquo;ve been writing about in this blog series provides Claude Code with direct access to MinIO operations through the Model Context Protocol. Instead of generating instructions for me to execute, Claude could directly:</p>
<ul>
<li>Create buckets</li>
<li>Upload objects with full content</li>
<li>Manage users and groups</li>
<li>Generate proper IAM policies with correct JSON syntax</li>
<li>Attach policies to groups</li>
<li>Verify the setup</li>
</ul>
<p>All through natural language interaction.</p>
<p>The key difference from manual setup:</p>
<ul>
<li><strong>No file creation overhead</strong> - Documents were created directly in buckets using the <code>text_to_object</code> tool, not saved locally first</li>
<li><strong>No JSON debugging</strong> - Claude generated syntactically correct IAM policies on the first try</li>
<li><strong>Parallel execution</strong> - Multiple operations run simultaneously</li>
<li><strong>Error prevention</strong> - Consistent naming, no typos, correct paths</li>
<li><strong>Built-in verification</strong> - Each step was validated automatically</li>
<li><strong>Progress tracking</strong> - The todo list managed the entire process</li>
</ul>
<h2 id="the-results">The results</h2>
<p>Here&rsquo;s what got created:</p>
<p><strong>7 buckets</strong> organized by business relationship:</p>
<ul>
<li>universal-products, global-services, premier-solutions (customers)</li>
<li>quality-components, reliable-materials (suppliers)</li>
<li>strategic-alliance, innovation-collaborative (partners)</li>
</ul>
<p><strong>28 documents</strong> with realistic content:</p>
<ul>
<li>Service agreements with specific terms ($500K-$1.2M contracts)</li>
<li>Quarterly business review notes</li>
<li>Performance metrics in CSV format</li>
<li>ROI analyses with detailed financial projections</li>
</ul>
<p><strong>7 users</strong> properly configured:</p>
<ul>
<li>3 sales team users (one per customer)</li>
<li>1 purchasing user</li>
<li>1 partner management user</li>
<li>2 C-level users (CEO and CFO)</li>
</ul>
<p><strong>6 groups</strong> with correct membership and policies</p>
<p><strong>9 IAM policies</strong> defining precise access control:</p>
<ul>
<li>Read-write policies for primary buckets</li>
<li>Read-only policies for cross-team visibility</li>
<li>Department-specific full access policies</li>
<li>C-level full access policy</li>
</ul>
<p>The access control matrix works exactly as designed:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Group</th>
          <th style="text-align: left">Own Buckets</th>
          <th style="text-align: left">Read-Only Access</th>
          <th style="text-align: left">Full Access</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left">sales-team-1</td>
          <td style="text-align: left">universal-products</td>
          <td style="text-align: left">global-services, premier-solutions</td>
          <td style="text-align: left">No</td>
      </tr>
      <tr>
          <td style="text-align: left">sales-team-2</td>
          <td style="text-align: left">global-services</td>
          <td style="text-align: left">universal-products, premier-solutions</td>
          <td style="text-align: left">No</td>
      </tr>
      <tr>
          <td style="text-align: left">sales-team-3</td>
          <td style="text-align: left">premier-solutions</td>
          <td style="text-align: left">universal-products, global-services</td>
          <td style="text-align: left">No</td>
      </tr>
      <tr>
          <td style="text-align: left">purchasing-dept</td>
          <td style="text-align: left">quality-components, reliable-materials</td>
          <td style="text-align: left">None</td>
          <td style="text-align: left">No</td>
      </tr>
      <tr>
          <td style="text-align: left">partner-mgmt</td>
          <td style="text-align: left">strategic-alliance, innovation-collaborative</td>
          <td style="text-align: left">None</td>
          <td style="text-align: left">No</td>
      </tr>
      <tr>
          <td style="text-align: left">c-level</td>
          <td style="text-align: left">All 7 buckets</td>
          <td style="text-align: left">N/A</td>
          <td style="text-align: left">Yes</td>
      </tr>
  </tbody>
</table>
<h2 id="the-time-comparison">The time comparison</h2>
<p>Let&rsquo;s be precise about the time savings:</p>
<p><strong>Manual setup</strong>: 4.5-6 hours (270-360 minutes)
<strong>Claude Code + AIStor MCP</strong>: 7 minutes
<strong>Time saved</strong>: ~4 hours 53 minutes
<strong>Speed improvement</strong>: 38-51x faster</p>
<p>But it&rsquo;s not just about speed. The automated approach eliminates:</p>
<ul>
<li>Repetitive typing</li>
<li>JSON syntax debugging</li>
<li>Context switching between the terminal and the text editor</li>
<li>Mental load of tracking what&rsquo;s done</li>
<li>Risk of typos and inconsistencies</li>
<li>Frustration from small errors</li>
</ul>
<p>This means you can iterate on environment designs quickly, respond to customer requests same-day, or rebuild environments for troubleshooting without blocking your day.</p>
<h2 id="next-step-using-it-with-aistor-tables">Next step: using it with AIStor Tables</h2>
<p>With this environment set up, I can create an AIStor Tables catalog of these buckets that contains all documents’ metadata, summaries, and keywords. With such a setup, I can build Agentic AI workflows to run cross-bucket queries:</p>
<ul>
<li>&ldquo;Find all contracts with value &gt; $500K&rdquo;</li>
<li>&ldquo;Which suppliers have defect rates below 0.5%?&rdquo;</li>
<li>&ldquo;Show me all partnerships with ROI &gt; 100%&rdquo;</li>
</ul>
<p>I also can:</p>
<ul>
<li>Build AI agents that navigate across organizational boundaries</li>
<li>Demonstrate how LLMs can work with properly structured data</li>
<li>Show the power of combining object storage with AI capabilities</li>
<li>Use <code>analyze_object</code> to generate summaries</li>
<li>Use <code>ask_object</code> to query specific documents</li>
<li>Search across multiple buckets for insights</li>
</ul>
<p>Stay tuned to the next blog covering these demos.</p>
<h2 id="other-scenarios-of-using-mcp-servers-and-ai-capabilities">Other scenarios of using MCP servers and AI capabilities</h2>
<p>This automation capability matters for anyone who needs to create reproducible environments.</p>
<p>For example, imagine you want to debug a complex issue involving access policies, replication rules, and information lifecycle configurations (e.g., when to expire or migrate certain versions of objects to another tier). Using the MCP server, you can quickly collect all the configuration information from your production environment and reproduce it in your test environment. Or you want to test a new version of the AIStor server and ensure your applications work correctly. The closer you can reproduce your production environment, the better.</p>
<p>This applies to many scenarios:</p>
<ul>
<li>Customer proof-of-concepts</li>
<li>Training workshops</li>
<li>Development and testing environments</li>
<li>Compliance and audit scenarios</li>
<li>Disaster recovery testing</li>
</ul>
<p>When you can go from &ldquo;here&rsquo;s what I need&rdquo; to &ldquo;it&rsquo;s ready&rdquo; in 7 minutes instead of 5 hours, you fundamentally change how you approach environment creation. And when the setup is deterministic and repeatable, you eliminate the &ldquo;works on my machine&rdquo; problems.</p>
<p>The combination of Claude Code&rsquo;s natural language understanding and the AIStor MCP server&rsquo;s direct access to MinIO operations creates something that feels almost magical—but it&rsquo;s just good engineering and thoughtful integration of AI capabilities.</p>
<h2 id="try-it-yourself">Try it yourself</h2>
<p>The AIStor MCP server is now available as a tech preview. Please <a href="https://www.min.io/contact-us">reach out to us</a> for installation instructions so you can try it yourself.</p>
<p>The future of infrastructure automation isn&rsquo;t just about scripting—it&rsquo;s about describing what you want in natural language and letting AI handle the implementation details. We&rsquo;re building that future, one MCP tool at a time.</p>
<hr>
<p><em>This is the fourth post in our series on the AIStor MCP server. Previous posts covered <a href="https://pavelanni.dev/blog/mcp_server_user_tools/">basic object operations</a>, <a href="https://pavelanni.dev/blog/mcp_server_admin_tools/">administration functions</a>, and <a href="https://pavelanni.dev/blog/mcp_server_how/">how it works internally</a>.</em></p>
<p><em>Pavel Anni is a training specialist at MinIO, focused on making object storage accessible and powerful for developers and enterprises.</em></p>
<p><em>Originally posted at <a href="https://blog.min.io/automating-cluster-deployments-with-mcp/">https://blog.min.io/automating-cluster-deployments-with-mcp/</a></em></p>
]]></content:encoded>
    </item>
    <item>
      <title>Model Context Protocol (MCP) Server for AIStor: How it works</title>
      <link>https://pavelanni.dev/blog/mcp_server_how/</link>
      <pubDate>Wed, 30 Apr 2025 00:00:00 +0000</pubDate>
      <guid>https://pavelanni.dev/blog/mcp_server_how/</guid>
      <description>&lt;p&gt;In the previous blog posts of this series, we discussed the &lt;a href=&#34;https://pavelanni.dev/blog/mcp_server_user_tools/&#34;&gt;user-level&lt;/a&gt; and &lt;a href=&#34;https://pavelanni.dev/mcp_server_admin_tools/&#34;&gt;admin-level&lt;/a&gt; functions of the Model Context Protocol (MCP) server for MinIO AIStor. In the first blog, we learned how to review the bucket&amp;rsquo;s contents, analyze objects, and tag them for future processing. In the second blog, we also learned how to use admin commands and get information about the cluster. All that was done using human-language commands and simply chatting with the cluster.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>In the previous blog posts of this series, we discussed the <a href="https://pavelanni.dev/blog/mcp_server_user_tools/">user-level</a> and <a href="https://pavelanni.dev/mcp_server_admin_tools/">admin-level</a> functions of the Model Context Protocol (MCP) server for MinIO AIStor. In the first blog, we learned how to review the bucket&rsquo;s contents, analyze objects, and tag them for future processing. In the second blog, we also learned how to use admin commands and get information about the cluster. All that was done using human-language commands and simply chatting with the cluster.</p>
<p>Now, let&rsquo;s discuss the internals: how this works and how we developed this MCP server.</p>
<h2 id="high-level-mcp-architecture">High-level MCP architecture</h2>
<p>The purpose of the Model Context Protocol is to be a universal mediator between the Large Language Model (LLM) and various resources and tools. The protocol is used with a language model to translate human language into the language accepted by the resource or tool. In the case of the AIStor server, we translate human-language requests, such as &ldquo;list objects in the bucket &lsquo;demo&rsquo;,&rdquo; into MCP tool calls, which are then translated into SDK calls to the AIStor object store.</p>
<p>Here is how the language transformation is happening in the MCP workflow:</p>
<ul>
<li>The user asks the LLM client (e.g., Claude Desktop) to perform an action using natural language (e.g., &ldquo;list objects in bucket demo&rdquo;).</li>
<li>The LLM translates the request from human language to a JSON-RPC 2.0 request with parameters, such as the bucket name.</li>
<li>This request is sent to the MCP server, which translates it into a function call from the resource SDK (e.g., the MinIO SDK for Go).</li>
<li>The resource sends the response back to the MCP server (e.g., a list of objects in this bucket as a Go slice).</li>
<li>The MCP server converts this list to plain text or JSON and sends it back to the client.</li>
<li>The client sends this response to the LLM to translate it into human language.</li>
</ul>
<p>And here is the flowchart diagram.</p>
<figure>
    <img loading="lazy" src="./image1.png"
         alt="MCP Architecture Flowchart"/> 
</figure>

<h2 id="tools-and-languages">Tools and languages</h2>
<p>Several SDKs for MCP are available for different languages. Some SDKs are provided by Anthropic (Python, TypeScript, Kotlin, and Java), while others are created by the community (Go, Rust, etc.).</p>
<p>The prototype of this server was created in Python using the SDK provided by Anthropic.</p>
<p>For the production version, we switched to Go. Most of the MinIO object storage software is written in Go, so it was a natural choice for us. We implement MCP server functions, such as listing buckets, retrieving object information, and setting object tags, using the MinIO SDK for Go. With Go, it&rsquo;s easy to create and distribute executables for various architectures and operating systems.</p>
<p>We use the <a href="https://github.com/mark3labs/mcp-go?ref=blog.min.io">mcp-go</a> library created by <a href="https://mark3labs.com/?ref=blog.min.io">Mark III Labs</a>, which is widely accepted by the community.</p>
<h2 id="server-architecture">Server architecture</h2>
<p>This server consists of 25+ tools that work with the object storage cluster, including core commands such as &rsquo;list buckets&rsquo;, &lsquo;get object tags&rsquo;, &lsquo;get bucket transition rules&rsquo;, etc. Each function is a gateway between the object storage cluster and the MCP client. It translates the response it receives from the cluster to plain text or JSON and sends it to the client, in this case, Claude for Desktop.</p>
<p>Functions are grouped by their level of access to the cluster resources:</p>
<ul>
<li>Read-only tools (enabled by default)</li>
<li>Tools that can write to the cluster (enabled by the <code>--allow-write</code> flag)</li>
<li>Tools that can delete objects and buckets (enabled by the <code>--allow-delete</code> flag)</li>
<li>Admin tools (enabled by the <code>--allow-admin</code> flag)</li>
</ul>
<p>Depending on the flags used to start the server, it enables certain groups of tools. For example, adding the <code>allow-write</code> flag to the server command in the configuration file will enable functions that can create objects and buckets.</p>
<h2 id="distribution">Distribution</h2>
<p>This server runs as a Docker container on your host OS. We decided to distribute it as a standard OCI container image because it is more common for our users to have a container platform (such as Docker or Podman) installed on their desktop or laptop than Python or Node.js tools, which are typically used by other servers.</p>
<p>To run this server, you don&rsquo;t have to clone a repository or install the <code>uv</code> tool for Python. You configure the <code>claude_desktop_config.json</code> file by adding the appropriate <code>docker</code> or <code>podman</code> command and setting the necessary flags and environment variables.</p>
<h2 id="mcp-clients">MCP clients</h2>
<p>This server can be used with multiple clients that support the MCP protocol. It has been tested with Claude for Desktop, Cursor IDE, <code>mcp-cli</code> text-based client, and others.</p>
<h2 id="security">Security</h2>
<p>We previously mentioned security considerations in our blog. Here are some implementation details.</p>
<p>The server is configured via the <code>claude_desktop_config.json</code> file. To access the object store cluster, you must provide the credentials, including access and secret keys, via environment variables. You can use your personal credentials or create a special user account for the MCP server. We recommend the latter. When creating a special user account, you can create and attach fine-grained policies that give that account access only to specific buckets and functions. Please consult the MinIO documentation (<a href="https://min.io/docs/minio/linux/administration/identity-access-management/policy-based-access-control.html?ref=blog.min.io">Access Management — MinIO Object Storage for Linux</a>) on creating such a policy.</p>
<h2 id="sequence-diagram">Sequence diagram</h2>
<p>Let&rsquo;s look at the more detailed diagram of the sequence of events here. It shows the flow between the user, the LLM, the MCP server, and the AIStor object store.</p>
<figure>
    <img loading="lazy" src="./image2.png"
         alt="Sequence Diagram"/> 
</figure>

<p>There is no need to explain a picture worth one thousand words, but it&rsquo;s important to note that the objects don&rsquo;t leave the object store. Even when you ask a question about the image content, it is analyzed inside the object store cluster, not by the LLM you use.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The MCP server for MinIO AIStor is an excellent addition to traditional methods of interacting with object storage via the command line and/or the UI. It&rsquo;s like having a universal translator between what you want to do and what the machines need to hear. No more memorizing complex commands or syntax – just tell it what you need in plain language.</p>
<p>In addition to translating from natural language to the server language, the MCP server can utilize LLM to summarize cluster information, including bucket contents, storage distribution, and cluster status. This is something that wasn&rsquo;t possible with just CLI or UI tools.</p>
<p>As language models become more sophisticated, the way we interact with our storage systems will feel increasingly natural.</p>
<p>Security of MCP servers is one of the hot topics recently. We&rsquo;ve built in flexible yet strong security, so admins can still control who has access to what. We have implemented several layers of security protections on this server, with the principle of least privilege in mind.</p>
<p>With this MCP server, MinIO isn&rsquo;t just democratizing object storage; it&rsquo;s rethinking how we work with technology in a world where machines are becoming increasingly adept at understanding us. We&rsquo;re excited to see what cool things you&rsquo;ll do with this technology and how it&rsquo;ll change your relationship with your data.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Model Context Protocol (MCP) Server for AIStor: administration functions</title>
      <link>https://pavelanni.dev/blog/mcp_server_admin_tools/</link>
      <pubDate>Wed, 09 Apr 2025 00:00:00 +0000</pubDate>
      <guid>https://pavelanni.dev/blog/mcp_server_admin_tools/</guid>
      <description>&lt;p&gt;In the &lt;a href=&#34;https://pavelanni.dev/blog/mcp_server_user_tools/&#34;&gt;previous blog&lt;/a&gt; of this series, we discussed the basic user-level functions of the Model Context Protocol (MCP) server for MinIO AIStor. We learned how to review a bucket&amp;rsquo;s contents, analyze objects, and tag them for future processing using human-language commands and simply chatting with the cluster via an LLM such as Anthropic Claude.&lt;/p&gt;
&lt;p&gt;In this blog, we&amp;rsquo;ll learn how to use the same approach to supplement AIStor administrative functions, such as getting information about the cluster&amp;rsquo;s status and health, analyzing data distribution, and other tasks.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>In the <a href="https://pavelanni.dev/blog/mcp_server_user_tools/">previous blog</a> of this series, we discussed the basic user-level functions of the Model Context Protocol (MCP) server for MinIO AIStor. We learned how to review a bucket&rsquo;s contents, analyze objects, and tag them for future processing using human-language commands and simply chatting with the cluster via an LLM such as Anthropic Claude.</p>
<p>In this blog, we&rsquo;ll learn how to use the same approach to supplement AIStor administrative functions, such as getting information about the cluster&rsquo;s status and health, analyzing data distribution, and other tasks.</p>
<p>Experienced MinIO system administrators are not afraid of the command line. They can easily use the <code>mc admin</code> commands to manage their clusters. What could using LLM and MCP servers add to their experience? LLMs are great at summarizing large amounts of information and can be used to format the information we get from the cluster in a human-readable way.</p>
<p>Less advanced users who also need information about the cluster status can use LLM capabilities to get a simple answer to their questions instead of calling their admin.</p>
<h2 id="aistor-cluster-admin-info">AIStor cluster admin info</h2>
<p>When we want to get admin information about the cluster, we run <code>mc admin</code> commands. Usually, they give you a list of nodes and disks with information about their health and utilization. What if we ask LLM the same question?</p>
<figure>
    <img loading="lazy" src="image5-1.png"
         alt="Cluster health command output"/> 
</figure>

<p>Here is the summary from the MCP server working with Claude.</p>
<figure>
    <img loading="lazy" src="image10-1.png"
         alt="MCP summary from Claude"/> 
</figure>

<p>Now, imagine you are the administrator of this cluster, and your boss is asking you about its state. Would you be more comfortable presenting this report instead of a command-line output?</p>
<p>Even if this is too much information for your boss, you can ask for a shorter report. It&rsquo;s just one prompt away.</p>
<figure>
    <img loading="lazy" src="image13-1.png"
         alt="Shorter report prompt"/> 
</figure>

<p>And the answer was:</p>
<figure>
    <img loading="lazy" src="image8-1.png"
         alt="Shorter report answer"/> 
</figure>

<p>This is unimaginable with the current command-line or UI tools. Having such a concise summary of your cluster health is extremely valuable. We even got a recommendation from Claude.</p>
<p>Besides the health status, let&rsquo;s ask about available space and overall data distribution. Of course, you can get this information by analyzing the output of the <code>mc admin info --json</code> command, but maybe LLM can do it too?</p>
<figure>
    <img loading="lazy" src="image21.png"
         alt="Prompt for available space and data distribution"/> 
</figure>

<figure>
    <img loading="lazy" src="image11-1.png"
         alt="Data distribution table"/> 
</figure>

<p>Wow, this is helpful! Claude provided each bucket&rsquo;s total size, number of objects, and their characteristics. Did you notice the beautiful table it created? You would spend a good portion of a day building it by hand.</p>
<figure>
    <img loading="lazy" src="image19.png"
         alt="Table created by Claude"/> 
</figure>

<figure>
    <img loading="lazy" src="image23.png"
         alt="Detailed data distribution"/> 
</figure>

<p>What if we ask to create a pie chart? Can Claude do that?</p>
<figure>
    <img loading="lazy" src="image3-1.png"
         alt="Pie chart prompt"/> 
</figure>

<p>And it did it!</p>
<figure>
    <img loading="lazy" src="image9-1.png"
         alt="Pie chart result"/> 
</figure>

<p>Let&rsquo;s ask for the cluster diagram.</p>
<figure>
    <img loading="lazy" src="image24.png"
         alt="Cluster diagram prompt"/> 
</figure>

<figure>
    <img loading="lazy" src="image17-2.png"
         alt="Cluster diagram result"/> 
</figure>

<figure>
    <img loading="lazy" src="image12-1.png"
         alt="Another cluster diagram"/> 
</figure>

<h2 id="object-versioning">Object versioning</h2>
<p>One of the essential features of object storage lifecycle management is <a href="https://min.io/product/object-versioning-bucket-versioning?ref=blog.min.io">versioning</a>, in which each write of an object is stored as a unique object. Versioning is typically not enabled by default. Imagine you are a data scientist and don&rsquo;t remember the exact command to check if versioning is enabled on a bucket. Use human language and ask:</p>
<figure>
    <img loading="lazy" src="image6-1.png"
         alt="Prompt to check versioning"/> 
</figure>

<p>Claude not only answers but also gives you a good explanation of the capabilities.</p>
<figure>
    <img loading="lazy" src="image2-1.png"
         alt="Claude&#39;s explanation of versioning"/> 
</figure>

<p>This is an excellent reminder of the importance of versioning, especially for inexperienced cluster administrators and less privileged users. What about the other bucket?</p>
<figure>
    <img loading="lazy" src="image16-1.png"
         alt="Prompt for other bucket"/> 
</figure>

<p>Note that we used normal human language; we didn&rsquo;t specify exactly what we needed, and still, Claude understood us perfectly.</p>
<figure>
    <img loading="lazy" src="image18-1.png"
         alt="Claude&#39;s explanation for other bucket"/> 
</figure>

<p>This is another excellent explanation of why versioning is important and why it might be disabled for this bucket. Instead of asking our cluster admin, let&rsquo;s enable versioning for this bucket, too. We don&rsquo;t remember the exact command, but it&rsquo;s not a problem.</p>
<figure>
    <img loading="lazy" src="image1-2.png"
         alt="Prompt to enable versioning"/> 
</figure>

<p>We didn&rsquo;t specify which bucket we wanted to enable versioning on, but Claude remembered the conversation&rsquo;s context and took the right action.</p>
<figure>
    <img loading="lazy" src="image4-1.png"
         alt="Claude enables versioning"/> 
</figure>

<p>Note the suggestion at the end. Let&rsquo;s follow it. Trust but verify, right?</p>
<figure>
    <img loading="lazy" src="image22.png"
         alt="Verification prompt"/> 
</figure>

<p>And it was confirmed.</p>
<figure>
    <img loading="lazy" src="image15-1.png"
         alt="Confirmation result"/> 
</figure>

<h2 id="information-lifecycle-management-ilm">Information lifecycle management (ILM)</h2>
<p>You can also use human language requests to get information about the lifecycle management configuration, such as rules for keeping versions of an object and deletion restrictions.</p>
<figure>
    <img loading="lazy" src="image14-1.png"
         alt="ILM prompt"/> 
</figure>

<p>And here is the summary.</p>
<figure>
    <img loading="lazy" src="image20.png"
         alt="ILM summary"/> 
</figure>

<p>Again, Claude not only gave us basic information (what we would expect from a command line or UI tools) but also explained its meaning. This is where an MCP server shines as a supplement to traditional command-line administration - it provides context and guidance to simplify and accelerate tasks.</p>
<h2 id="security-considerations">Security considerations</h2>
<p>Security is one of the first questions usually asked about MCP servers. How do we control what they can do if we allow them to work with our file systems, databases, and object stores?</p>
<p>We do it on multiple levels with the AIStor MCP server. First, the server starts in read-only mode by default. Second, we added special flags to the MCP server to give it additional capabilities. The user can add the following flags: <code>--allow-write</code>, <code>--allow-delete</code>, and <code>--allow-admin</code>. Certain functions won&rsquo;t even be visible to the LLM without these flags. The LLM won&rsquo;t even know that the server can delete objects or get cluster information.</p>
<p>In addition, all the object store privileges and rights are applicable. We suggest creating a separate account for this server and a security profile allowing only the necessary operations. We recommend using the server first in read-only mode, mainly to summarize buckets and objects. If there is a need for admin-level information, use the <code>diagnostics</code> profile that exists by default in each cluster.</p>
<h2 id="conclusion">Conclusion</h2>
<p>When combined with LLM capabilities, the MCP server adds new capabilities to AIStor&rsquo;s administration and user experience. It transforms complex command-line outputs into human-readable summaries, provides concise answers to cluster status queries, and generates visual aids like pie charts and diagrams. This approach empowers both administrators and regular users to gain quick and understandable insights into their cluster&rsquo;s health, data distribution, and object management.</p>
<p>Furthermore, integrating LLMs is a valuable learning tool, offering explanations and context alongside information retrieval.</p>
<p>Security is addressed through multiple layers, including a default read-only mode and specific flags for granting additional capabilities, ensuring controlled and secure access to administrative functions. The MCP server, therefore, offers a powerful and intuitive way to manage and understand AIStor clusters, making it a valuable asset for any organization.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Introducing Model Context Protocol (MCP) Server for MinIO AIStor</title>
      <link>https://pavelanni.dev/blog/mcp_server_user_tools/</link>
      <pubDate>Fri, 28 Mar 2025 00:00:00 +0000</pubDate>
      <guid>https://pavelanni.dev/blog/mcp_server_user_tools/</guid>
      <description>&lt;p&gt;GenAI is entering the agentic phase, with software agents collaborating with humans and other agents to reason and achieve complex goals. Agents are already demonstrating incredible intelligence and are very helpful with question answering, but as with humans, they need the ability to discover and access software applications and other services to actually perform useful work. The creators of such services, for example, Snowflake for data warehousing and Salesforce for CRM, are in the best position to expose their capabilities for agentic use. These agents also need direct access to the large and fast-growing volumes of multi-modal enterprise data, data which is increasingly aggregated in high-scale and &lt;a href=&#34;https://min.io/solutions/object-storage-for-ai?ref=blog.min.io&#34;&gt;high-performance AI storage&lt;/a&gt; such as AIStor. But the GenAI industry needs standards to make all of this feasible.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>GenAI is entering the agentic phase, with software agents collaborating with humans and other agents to reason and achieve complex goals. Agents are already demonstrating incredible intelligence and are very helpful with question answering, but as with humans, they need the ability to discover and access software applications and other services to actually perform useful work. The creators of such services, for example, Snowflake for data warehousing and Salesforce for CRM, are in the best position to expose their capabilities for agentic use. These agents also need direct access to the large and fast-growing volumes of multi-modal enterprise data, data which is increasingly aggregated in high-scale and <a href="https://min.io/solutions/object-storage-for-ai?ref=blog.min.io">high-performance AI storage</a> such as AIStor. But the GenAI industry needs standards to make all of this feasible.</p>
<p>Model Context Protocol (MCP), recently announced by Anthropic, is a standard model for how applications and services can be discovered and interacted with in a universal way. MCP also leverages language models to summarize the rich output of these services and can present crucial information in a human-readable form. MCP is therefore a crucial step for unlocking the potential of human-in-the-loop as well as fully autonomous agentic workflows. MCP is analogous to USB-C, which is a standard for connecting devices and operating systems. MCP connects agents to software applications and services. Until now, every agentic developer has had to write their own custom plumbing, glue code, etc., to do this. Without a standard like MCP, building real-world agentic workflows is essentially impossible.</p>
<p>MinIO, the world&rsquo;s leading AI storage provider, has a rich history of innovation, and we have been pushing boundaries at the intersection of storage, data, and AI. Today, we are thrilled to deliver another industry-first innovation—Model Context Protocol (MCP) for enterprise AI storage. We are previewing an MCP server that enables interaction with and management of MinIO AIStor, our commercial object-store offering, simply by chatting with an LLM such as Anthropic Claude or OpenAI ChatGPT. This preview release includes more than 25 commonly used actions, making exploring and using data in an AIStor object store easier than ever.</p>
<p>This blog post is the first in a series covering various features of the MinIO MCP server. The series will show you user-level and administrative functions performed with the MCP server. We will also tell you how it works inside and how it&rsquo;s being developed. We will describe using the MCP server as part of a typical Agentic AI workflow.</p>
<p>Let&rsquo;s explore the exciting new capabilities of the MinIO MCP server. Watch the video or read the text below it to learn about user-level operations with the MCP server for AIStor.</p>
<h2 id="bucket-and-object-operations">Bucket and object operations</h2>
<p>Let&rsquo;s see how easy it is to explore data in an AIStor object store using MCP. In the following screenshots and dialogues, I&rsquo;ll show you some operations with the AIStor object store using Anthropic Claude for Desktop client. Plenty of other MCP clients are available, and you can configure all of them to use the AIStor MCP server.</p>
<p>I started with a simple request.</p>
<figure>
    <img loading="lazy" src="./image4.png"
         alt="List buckets prompt"/> 
</figure>

<p>Claude asked me if it was allowed to use the tool to list buckets.</p>
<figure>
    <img loading="lazy" src="./image17-1.png"
         alt="Allow list buckets"/> 
</figure>

<p>Claude asks for permission every time you want to use a tool. As you can see, you can either &ldquo;Allow Once&rdquo; or &ldquo;Allow for This Chat.&rdquo; I allowed each new tool for this chat as it appeared in my dialogue. From now on, I will skip these screenshots.</p>
<p>Claude displayed the list of buckets (I truncated the screenshot for brevity).</p>
<figure>
    <img loading="lazy" src="image1-1.png"
         alt="List buckets"/> 
</figure>

<p>Let&rsquo;s see what&rsquo;s inside one of the buckets:</p>
<figure>
    <img loading="lazy" src="image16.png"
         alt="List objects in aistor-mcp prompt"/> 
</figure>

<p>Claude displayed a list of objects grouped by categories: white papers, blog Markdown files, and images. At the end of the list, it gave a nice summary of the bucket contents.</p>
<figure>
    <img loading="lazy" src="image11.png"
         alt="List objects in aistor-mcp"/> 
</figure>

<figure>
    <img loading="lazy" src="image2.png"
         alt="List objects in aistor-mcp"/> 
</figure>

<p>This is the first advantage of using LLMs. The command-line tool or web UI would give us a list of objects, as requested. The LLM summarizes the bucket&rsquo;s content and provides an insightful narrative of its composition. Imagine if I had thousands of objects here. A typical command-line query would give us a long list of objects that could be hard to consume. Here, it gives us a human-readable overview of the bucket&rsquo;s contents. It is similar to summarizing an article with your favorite LLM client.</p>
<p>You can be more specific and ask to list a subset of objects. For example, I&rsquo;d like to see what benchmark reports we have in the bucket.</p>
<figure>
    <img loading="lazy" src="image15.png"
         alt="List benchmark reports prompt"/> 
</figure>

<p>Claude gave me the list I expected.</p>
<figure>
    <img loading="lazy" src="image6.png"
         alt="List benchmark reports"/> 
</figure>

<p>Of course, it listed the PDF files with &ldquo;benchmark&rdquo; in their names! But wait… it also listed the papers called &ldquo;Performance comparison …&rdquo; and &ldquo;&hellip; support perf.pdf.&rdquo; Interesting! That means it doesn&rsquo;t simply search for the word but understands my request&rsquo;s meaning. This is another advantage of using LLMs.</p>
<h2 id="object-analysis-operations">Object analysis operations</h2>
<p>Now, I want to get details about one of the benchmark reports. I don&rsquo;t have to copy and paste the full file name. Just a little detail will be enough.</p>
<figure>
    <img loading="lazy" src="image9.png"
         alt="Get Sapphire benchmark report prompt"/> 
</figure>

<p>Claude understood me perfectly. It tried to read the file contents, but because that function was not implemented for PDF files, it resorted to using the ask-object function. With this new AIStor function, you can ask questions about an object directly, and AIStor will use the AI model embedded in the cluster to answer the question. This is what it did here, giving us a good summary of this paper.</p>
<p>Note that in this case, Claude was smart enough to invoke AIStor&rsquo;s internal model via the prompt_object API to accomplish its goal. The document didn&rsquo;t even leave the cluster. Many organizations are reluctant to use AI mainly because of security concerns. In the case of AIStor, all objects stay within your security perimeter.</p>
<figure>
    <img loading="lazy" src="image8.png"
         alt="Get Sapphire benchmark report"/> 
</figure>

<p>I have a bucket on this server with 200 receipt images uploaded from a public dataset. We&rsquo;ll start with analyzing the bucket. Then, we&rsquo;ll analyze individual receipts and add tags to the objects.</p>
<figure>
    <img loading="lazy" src="image13.png"
         alt="List receipts bucket prompt"/> 
</figure>

<p>Claude gave me a comprehensive overview of the bucket.</p>
<figure>
    <img loading="lazy" src="image7-1.png"
         alt="Receipts bucket"/> 
</figure>

<p>Did you notice that Claude thought about the most logical next step and suggested it? Sure, I want to get the details about one of the receipts!</p>
<figure>
    <img loading="lazy" src="image5.png"
         alt="Details about receipt 1000 prompts"/> 
</figure>

<p>And I got the complete analysis of the receipt. Again, the MCP server knew about the ask-object function and used it to analyze the object. The object itself didn&rsquo;t leave the AIStor cluster.</p>
<figure>
    <img loading="lazy" src="image10.png"
         alt="Details about receipt 1000"/> 
</figure>

<p>To make future analysis faster and more effective, it would be great to add tags to each object, such as vendor, transaction date, amount, and currency. Let&rsquo;s try it.</p>
<figure>
    <img loading="lazy" src="image14.png"
         alt="Add tags to receipt 1000"/> 
</figure>

<p>And the server worked perfectly!</p>
<p>Why don&rsquo;t we tag the other receipts? To save time, let&rsquo;s tag the first five.</p>
<figure>
    <img loading="lazy" src="image12.png"
         alt="Add tags to receipts"/> 
</figure>

<p>The server analyzed the images, assigned tags, and checked them with get-object-tags. For the last object, the vendor&rsquo;s name contained an apostrophe, which is not allowed in tags. The server diagnosed the problem and fixed it by removing the apostrophe.</p>
<figure>
    <img loading="lazy" src="image18.png"
         alt="Add tags to receipts result"/> 
</figure>

<p>Imagine doing the same operation without MCP servers. You would have to write a Python script to pull images from the bucket, send them to an AI model for analysis, get the information back, decode it, find the correct fields, apply tags to objects… You could easily spend half a day creating and debugging such a script. We just did it simply using human language in a matter of seconds.</p>
<h2 id="conclusion">Conclusion</h2>
<p>LLMs and MCP servers simplify normal user operations, summarize the bucket contents, analyze objects, and add tags and other metadata easily.</p>
<p>In the next blog in this series, we will explore how the MinIO AIStor MCP server can supplement core admin tasks such as getting the cluster health status, available storage, listing and adding versioning, replication rules, and other bucket parameters. Using the MCP server and LLMs can give you data distribution by bucket and even create a pie chart. Stay tuned.</p>
<h2 id="next-steps">Next steps</h2>
<p>This MCP server for AIStor is currently under active development. If you want to try it, find all the instructions in our GitHub repository <a href="https://github.com/minio/mcp-server-aistor?ref=blog.min.io">https://github.com/minio/mcp-server-aistor</a>.</p>
<p>The MCP server can work with MinIO AIStor and MinIO Object Store (community edition). The community edition doesn&rsquo;t support some AIStor-specific functions, such as ask-object.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Curl vs. requests</title>
      <link>https://pavelanni.dev/blog/curl_vs_requests/</link>
      <pubDate>Mon, 30 Dec 2024 18:20:06 -0500</pubDate>
      <guid>https://pavelanni.dev/blog/curl_vs_requests/</guid>
      <description>&lt;p&gt;Here is an interesting development of the &lt;a href=&#34;https://pavelanni.dev/projects/geekhouse&#34;&gt;Geekhouse&lt;/a&gt; project.
I wanted to add a simple 16x2 LCD display to the server.
I connected the display to the I2C pins and added the code to serve the appropriate URL (&lt;code&gt;/lcd?text=...&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;I started testing it with cURL and realized that I had to use the &lt;code&gt;--url-query&lt;/code&gt; flag to pass any text with spaces, like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-none&#34; data-lang=&#34;none&#34;&gt;curl -X POST --url-query &amp;#39;text=&amp;#34;Hello world&amp;#34;&amp;#39; http://192.168.1.130/lcd
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It worked fine with the &lt;code&gt;curl&lt;/code&gt; command, and the text was displayed on the LCD.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>Here is an interesting development of the <a href="/projects/geekhouse">Geekhouse</a> project.
I wanted to add a simple 16x2 LCD display to the server.
I connected the display to the I2C pins and added the code to serve the appropriate URL (<code>/lcd?text=...</code>).</p>
<p>I started testing it with cURL and realized that I had to use the <code>--url-query</code> flag to pass any text with spaces, like this:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">curl -X POST --url-query &#39;text=&#34;Hello world&#34;&#39; http://192.168.1.130/lcd
</code></pre><p>It worked fine with the <code>curl</code> command, and the text was displayed on the LCD.</p>
<p>I started testing it with the LLM asking &ldquo;Please display &lsquo;Hello world&rsquo; on the LCD&rdquo; and, of course, got errors.</p>
<p>In this project, I am sending the following prompt to the LLM:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">You are an IoT API assistant. Convert user commands to curl commands for the IoT server.

Respond only with the curl command, no explanations.
</code></pre><p>Usually, the LLM responds with a command like:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">curl -X POST http://192.168.1.130/leds/1/on
</code></pre><p>and my dialogue program parses it and converts to a call to one of the <code>requests</code> functions.
The parsing function figures out if it&rsquo;s GET or POST and pulls the URL from the <code>curl</code> command.</p>
<p>But with LCD it was different. The LLM responded with this:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">curl -X POST http://192.168.1.130/lcd -H &#34;Content-Type: application/json&#34; -d &#39;{&#34;text&#34;:&#34;Hello world&#34;}&#39;
</code></pre><p>and, of course, my parsing function wasn&rsquo;t ready to work with this string.</p>
<p>After trying for half-an-hour to parse it properly, I had a brilliant idea.
&ldquo;Let&rsquo;s ask the LLM to make this work for us!&rdquo;</p>
<p>I changed the prompt to:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">You are an IoT API assistant. Convert user commands to function calls of the Python requests library to send requests to the IoT server.

Respond only with the function call, no explanations.
</code></pre><p>And the whole function parsing the <code>curl</code> command and converting it to a <code>requests</code> call was reduced from 20 lines of code to a simple <code>eval</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>response <span style="color:#f92672">=</span> eval(function_call)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> response<span style="color:#f92672">.</span>json()
</span></span></code></pre></div><p>Isn&rsquo;t it cool?</p>
<p>Well, I have to admit that the LLM made a mistake once.
After asking to turn the motor on, I asked to turn it off.
The LLM responded with <code>requests.get()</code> instead of <code>requests.post()</code>.
The fix? I added this to the prompt:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">Please make sure to use the correct method and endpoint.
Requests for actions like turning on/off devices should be POST requests.
</code></pre><p>and it didn&rsquo;t make mistakes since then.</p>
]]></content:encoded>
    </item>
    <item>
      <title>From Alexa to LLM: The Geekhouse Story</title>
      <link>https://pavelanni.dev/blog/geekhouse_story/</link>
      <pubDate>Mon, 23 Dec 2024 00:00:00 +0000</pubDate>
      <guid>https://pavelanni.dev/blog/geekhouse_story/</guid>
      <description>How a toy project helped me learn about IoT, REST APIs, and large language models</description>
      <content:encoded><![CDATA[<h2 id="previously">Previously</h2>
<p>It all started several years ago when I thought &ldquo;Can I use Alexa to manage physical devices?&rdquo;
I created a simple demo project to show it to my amateur radio friends.
You would say, &ldquo;Alexa, turn my antenna 90 degrees,&rdquo; and my model antenna would magically turn.
It was a fun weekend project that used a Raspberry Pi, a stepper motor, and a miniature model antenna.
I published it on YouTube, if you&rsquo;re curious.</p>
<p><img alt="Alexa Antenna application" loading="lazy" src="/blog/geekhouse_story/alexa_antenna.png"></p>
<h2 id="smart-dollhouse">Smart dollhouse</h2>
<p>The Alexa experiment was great but had some issues.
With Alexa, you should be very precise with your phrases.
In my application, Alexa expected you to say, &ldquo;Alexa, turn my antenna XX degrees,&rdquo; and only this exact phrase would work.
With the growing availability of large language models (LLM) I started thinking about using them to manage motors, devices, and sensors.</p>
<p>&ldquo;What if I get a dollhouse,&rdquo; I thought, &ldquo;and add a bunch of sensors and a microcontroller and write a REST API server for them?&rdquo;
Then, I&rsquo;ll ask an LLM to convert human language to REST API requests and send them to the server.
Also, I wanted something that looks realistic and not just a bunch of sensors scattered on my desk.</p>
<p>Luckily, I found <em>exactly</em> what I needed on Amazon: a wooden house with many of sensors attached to it and a microcontroller.
I have to praise the great quality of the kit I bought from Keyestudio: it was very easy to build and attach the sensors, and the
software worked on the first run!
If you tried similar kits before, you should know it&rsquo;s not very common.</p>
<p><img alt="Keyestudio house" loading="lazy" src="/blog/geekhouse_story/keyestudio_house_esp32.png"></p>
<p>The kit included a simple Arduino controller with a sample program that implemented some business logic: for example,
when there was a signal on the rain sensor on the roof, the servo motor would close the window.</p>
<p>It worked fine, but I needed a server to respond to remote requests.</p>
<h2 id="iot-server">IoT server</h2>
<p>I decided to switch to Raspberry Pi Pico W and MicroPython.
The &ldquo;W&rdquo; version of Pico gave me the connectivity I needed, and I was going to use MicroPython to implement a simple REST API server.</p>
<h3 id="using-ai-helpers">Using AI helpers</h3>
<p>As it&rsquo;s becoming more common these days, I asked Claude.ai to help me with this project.</p>
<blockquote>
<p>I want to write a program in MicroPython for Raspberry Pi Pico. It should run a web server connected to WiFi.
The web server should accept REST requests and perform actions on the Pico, such as: getting information from sensors, turning LEDs on and off, turning motors (servos and steppers) on and off.
Please help me create an API for this application.</p></blockquote>
<p>Claude answered with a list of <code>curl</code> commands that can be used against our planned API.
I liked this approach: you design the expected user experience first (in our case, it was REST API) and then start implementing it.</p>
<p>It created a very simple web server without using any frameworks at all, just with sockets.
I asked if we can use a framework like Flask, and Claude told me that with MicroPython it&rsquo;s more common to use Microdot
because it is lighter on resources.
I learned that Microdot is being developed by Miguel Grinberg, the author of one of the most popular books about Flask.
That was a good recommendation and I asked Claude to re-write the web server with Microdot.</p>
<p>After that was done, I tested the server on my Pico and it worked just fine!
But I had a further idea, and I asked:</p>
<blockquote>
<p>Please help me write the GET function for Microdot that will return a list of sensors when called as <code>/sensors</code> and the sensor&rsquo;s value when called with /sensors/id.
I think this should help with discoverability.</p></blockquote>
<p>And here was another discovery!
Claude told me about the concept with a strange name &ldquo;HATEOAS,&rdquo; which stands for Hypermedia as the Engine of Application State.
It explained the idea behind it, gave examples, and finally, wrote the server code that would respond to routes like <code>/sensors</code> or <code>/leds</code> with
the information about sensors and LEDs locations (roof, garden, door), parameters (LED&rsquo;s color, sensor&rsquo;s calibration),
and, most importantly, the links I should use to work with them.</p>
<p>That was very close to what I needed.</p>
<h2 id="using-llm-to-generate-rest-api-calls">Using LLM to generate REST API calls</h2>
<p>Now, thanks to Claude&rsquo;s help, I had a server that would respond to requests like <code>curl http://server/leds</code> with the following JSON:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;_links&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;filter_by_color&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/filter?color={color}&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;templated&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;self&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;filter_by_location&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/filter?location={location}&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;templated&#34;</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  },
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;data&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;1&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;color&#34;</span>: <span style="color:#e6db74">&#34;yellow&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;location&#34;</span>: <span style="color:#e6db74">&#34;roof&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;state&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;_links&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;on&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/1/on&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;self&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/1&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;off&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/1/off&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;toggle&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/1/toggle&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;2&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;color&#34;</span>: <span style="color:#e6db74">&#34;white&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;location&#34;</span>: <span style="color:#e6db74">&#34;garden&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;state&#34;</span>: <span style="color:#ae81ff">0</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;_links&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;on&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/2/on&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;self&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/2&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;off&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/2/off&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">&#34;toggle&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/leds/2/toggle&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>What if I get similar JSONs from the other routes, such as <code>/sensors</code> and <code>/motors</code>, send them to an LLM, and ask it to convert human requests to <code>curl</code> commands?
Then, I can send those commands to the server via the <code>requests</code> Python library and work with the devices!</p>
<p>The next step was to write a simple Python script that would get the initial links from the root (<code>/</code>) route of the server and explore all links to get the same JSONs from them.</p>
<p>After the script collected the API description of the server this way, I created a system prompt that included the whole JSON with all the links from the server and added the request:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>You are an IoT API assistant.
</span></span><span style="display:flex;"><span>Convert user commands to curl commands for the IoT server.
</span></span><span style="display:flex;"><span>API structure: {json.dumps(self.api_structure, indent=2)}
</span></span><span style="display:flex;"><span>Server URL: {self.server_url}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Respond only with the curl command, no explanations.
</span></span></code></pre></div><p>Then I created a simple dialog that would get the command from the user in human language, attach it to the prompt, and send it to the LLM.</p>
<p>For this application, I used Haiku-3.5 from Anthropic because it is faster and cheaper than more complex models.</p>
<h2 id="translate-json-responses-to-human-language">Translate JSON responses to human language</h2>
<p>Now the system has started working.
I would send a message like, &ldquo;Turn the yellow LED on,&rdquo; and it turned on.
Or, &ldquo;Turn the garden LED on,&rdquo; and the white LED in the garden would turn on.
I could ask in normal language, &ldquo;What&rsquo;s the light conditions on the roof?&rdquo; and it responded with a JSON that included a lot of information:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;_links&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;config&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/sensors/2/config&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;self&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/sensors/2/value&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;all_sensors&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/sensors&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;sensor&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">&#34;href&#34;</span>: <span style="color:#e6db74">&#34;/sensors/2&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  },
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">&#34;data&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;unit&#34;</span>: <span style="color:#e6db74">&#34;lux&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;light&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;calibrated_value&#34;</span>: <span style="color:#ae81ff">4114.6</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;location&#34;</span>: <span style="color:#e6db74">&#34;roof&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;id&#34;</span>: <span style="color:#e6db74">&#34;2&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">&#34;raw_value&#34;</span>: <span style="color:#ae81ff">41146</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>That was good, but I wanted more reasonable human language instead of JSON!
LLM was to the rescue again!
Here is another prompt for Haiku:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>Convert this IoT API JSON response to concise human language, focusing on the most important information.
</span></span><span style="display:flex;"><span>Don&#39;t print &#39;Here is the concise summary&#39; or other explanations; print just the content:
</span></span></code></pre></div><p>And now the answers from my server looked like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>Light sensor #2 located on the roof is measuring 4,109.8 lux, indicating bright illumination conditions.
</span></span></code></pre></div><h2 id="first-public-demo">First public demo</h2>
<p>Recently, I went to the local Python meetup in Atlanta.
I was going to give a presentation about this project and demonstrate it.
Needless to say, I was nervous.</p>
<p>I started a hotspot on my cell phone, connected my laptop to it, and started the Pico.
It indicated that it had connected to the hotspot and got an IP address.
Now I had to change my dialogue program to point to that address.</p>
<p>The dialogue began:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">IoT Controller started. Type &#39;quit&#39; to exit.

&gt;&gt;&gt; turn the yellow led on

LED 1 is currently on, colored yellow, and located on the roof.

&gt;&gt;&gt; turn the motor on

Motor 1 (DC type) is currently ON and located on the roof.

&gt;&gt;&gt; turn the motor off

Motor #1, a DC motor located on the roof, is currently turned off.
</code></pre><p>So far, so good.
The public loved it when the propeller started spinning and gave me a round of applause.</p>
<p>I switched my keyboard to Cyrillic and typed this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>&gt;&gt;&gt; выключи желтый led
</span></span></code></pre></div><p>&ldquo;Wow, is it Russian?&rdquo; I heard the murmur in the audience.</p>
<p>And magic happened. The LED went off, and I got this message:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>LED 1 located on the roof is yellow and currently off (state 0).
</span></span></code></pre></div><p>I asked the meetup organizer, &ldquo;Pablo, can you tell it in Spanish to turn the LED on?&rdquo;</p>
<p>Pablo took the keyboard and typed:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>&gt;&gt;&gt; Por favor prender el LED amarillo
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>LED 1 is currently on (state 1), colored yellow, and located on the roof.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>&gt;&gt;&gt; Apaga la maldita led amarilla
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>LED 1 is currently off, located on the roof, and colored yellow.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>&gt;&gt;&gt;
</span></span></code></pre></div><p>&ldquo;Wow!&rdquo; said the audience.</p>
<p>I explained, &ldquo;This magic is done purely by the LLM. I didn&rsquo;t do anything in the code.
It understands many languages and translates between them and REST and JSON easily.
Note that when I spoke to it in Russian, I used the English word &lsquo;LED&rsquo;, and it understood me perfectly.&rdquo;</p>
<h2 id="lessons-learned">Lessons Learned</h2>
<p>This project turned out to be an unexpected learning journey.
I started with a simple idea of controlling a dollhouse but ended up learning:</p>
<ul>
<li>MicroPython and its differences from regular Python</li>
<li>Microdot - a lightweight web framework perfect for microcontrollers</li>
<li>HATEOAS - a REST API design principle with a name that sounds like a sneeze</li>
<li>The incredible versatility of LLMs in handling natural language processing</li>
</ul>
<p>Most importantly, I learned that AI assistants like Claude can be excellent coding partners.
They not only help with code but also introduce you to best practices and concepts you might not have discovered otherwise.</p>
<h2 id="future-plans">Future plans</h2>
<p>Of course, I started looking for ways to expand the project.
And, of course, I found this: a huge 42-in-1 kit from Keyestudio with a lot of sensors and actuators.</p>
<p>Wouldn&rsquo;t it be cool to say something and have it displayed on the LCD screen?
Gas sensors, accelerometers, buzzers, and motors are waiting for me to connect them.</p>
<h2 id="getting-more-serious">Getting more serious</h2>
<p>Of course, it&rsquo;s just a toy project created for learning.
If you want to use IoT in your real home, of course, you should take a look at <a href="https://www.home-assistant.io/">Home Assistant</a>.
That&rsquo;s what I did, thinking, &ldquo;Can I apply the same approach to creating Home Assistant&rsquo;s API requests?&rdquo;
Needless to say, the project has already implemented the <a href="https://developers.home-assistant.io/docs/core/llm/">API for LLMs</a>
&ldquo;Great minds think alike,&rdquo; I said to myself proudly. 😉</p>
<h2 id="project-code">Project code</h2>
<p>If you&rsquo;re interested in building your own multilingual smart dollhouse (or just want to see how it all works), check out the project on GitHub: <a href="https://github.com/pavelanni/geekhouse">Geekhouse Repository</a>. Feel free to contribute, ask questions, or share your own IoT adventures.</p>
<p>Try to use any language you know and see if the LLM can translate it into REST API calls!
Try other LLMs, try a local LLM running in your homelab.
Most likely, using a local LLM will make turning LEDs on and off faster.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Control panel: A TinyGo Adventure</title>
      <link>https://pavelanni.dev/blog/control_panel/</link>
      <pubDate>Mon, 07 Oct 2024 00:00:00 +0000</pubDate>
      <guid>https://pavelanni.dev/blog/control_panel/</guid>
      <description>A TinyGo project implementing a multi-functional control panel for time travel</description>
      <content:encoded><![CDATA[<p>You know how it all started? I was browsing Amazon one day (as you do) and came across this set of color 7-segment LED displays.</p>
<p><img alt="Seven-segment LEDs" loading="lazy" src="/blog/control_panel/seven-segment-leds.webp"></p>
<p>Instantly, I thought of building colorful time displays with knobs to control them.</p>
<p>This project combines two of my favorite things — tech tinkering and fun. I used two Raspberry Pi Picos, some 7-segment LED displays, rotary encoders, and (of course) Go! And before you ask — yes, it does something cool, and no, I haven’t explored all its possibilities just yet.</p>
<p>This isn’t my first attempt at making this thing. I originally started with an Arduino, writing the project in C++. I even got as far as setting time with rotary encoders before life happened (probably work). It stayed unfinished for years.</p>
<p><img alt="Version 1.0" loading="lazy" src="/blog/control_panel/version1.webp">
Control Panel v1.0 (Arduino)</p>
<p>Version 2.0, though? That’s where things got interesting. I picked up TinyGo — a Go compiler for microcontrollers — and decided it was the perfect excuse to learn both TinyGo and Go while having fun. TinyGo works like a charm on Raspberry Pi Pico, so that became my hardware of choice.</p>
<p><img alt="Version 2.0" loading="lazy" src="/blog/control_panel/front_panel.jpg">
Control Panel v2.0 (TinyGo and Raspberry Pi Pico)</p>
<h2 id="the-hardware">The Hardware</h2>
<p>Here’s the gear I used:</p>
<ul>
<li>Two <strong>Raspberry Pi Picos</strong></li>
<li>4-digit <strong>7-segment LED</strong> displays with the TM1637 chip</li>
<li><strong>Rotary encoders</strong> for setting the time (way more satisfying than buttons)</li>
<li><strong>DFPlayer Mini</strong> to play sound effects. Because what’s a time-travel toy without some audio drama?</li>
</ul>
<p>The plan was simple: one Pico would set the “preset time,” and the other would handle the “current time” and “memory time.” When you press a button imitating the actual time travel, the “preset time” becomes your “current time,” and the fun begins. Sounds cool, right?</p>
<p>I decided I&rsquo;ll make a version which adheres to international standards, specifically ISO-8601
where the dates are presented in the form: <code>2024-12-30 15:04:05</code>.
(Well, the real reason was that I couldn&rsquo;t find alphanumeric LEDs that could display months 😉.)</p>
<p><img alt="Front panel label" loading="lazy" src="/blog/control_panel/title_version_label.jpg"></p>
<h2 id="why-tinygo">Why TinyGo?</h2>
<p>Let’s be honest: when working with microcontrollers, the usual suspects are Arduino, MicroPython, or maybe CircuitPython. But with TinyGo, I got to work in Go — one of my favorite languages. It just made everything more enjoyable (and less tedious).</p>
<p>Here’s why TinyGo rocks:</p>
<ul>
<li><strong>Goroutines</strong>: No more messing around with loops and managing every state yourself. With goroutines, the heavy lifting is done.</li>
<li><strong>Channels</strong>: These made it super easy to pass messages around without worrying about blocking the main thread. (Looking at you, Arduino.)</li>
</ul>
<h2 id="how-it-all-works">How it all works</h2>
<p>This setup has two brains (aka Raspberry Pi Picos) that talk to each other via UART. Here’s a rough idea of what’s going on:</p>
<ol>
<li>
<p><strong>Left Pico</strong>: You set the “preset time” using rotary encoders and 7-segment displays.
You click the encoder and now you are in the &ldquo;setting&rdquo; mode.
You turn the encoder to find the desired value.
Then you click again and the value is set.
For the date and hour:min displays you have to do it twice. For example, to set the date:</p>
<ul>
<li>click</li>
<li>turn, find the month</li>
<li>click (the month is set)</li>
<li>turn, find the day</li>
<li>click (the day and the whole date is set)</li>
</ul>
</li>
</ol>
<p>Of course, when you turn the month encoder is goes over 12 and starts with 1 again.
Of course, the system knows how many days in each month and does the right thing.</p>
<p><img alt="Click - Turn - Click" loading="lazy" src="/blog/control_panel/click_turn_click.jpg"></p>
<ol start="2">
<li><strong>Right Pico</strong>: It handles the “current time” and “memory time” and then updates the times when you hit the button to start the journey.</li>
</ol>
<p><img alt="The button" loading="lazy" src="/blog/control_panel/the_button.jpg"></p>
<p>When you press the “time travel” button, the preset time is sent from the left Pico to the right Pico as an RFC3339 string (fancy, huh?). Then, the present time becomes the new “memory time” — you know, just in case you want to go back to the exact moment you left.</p>
<p><img alt="Hardware diagram" loading="lazy" src="/blog/control_panel/breadboard.webp">
Hardware connections</p>
<h2 id="software-shenanigans">Software Shenanigans</h2>
<p>I kept the code simple but flexible. I used goroutines to handle multiple tasks at once — like updating the display while you’re setting the time — and channels to manage button presses and other events without creating a tangled mess of code.</p>
<p>Here’s the basic rundown:</p>
<ul>
<li>You turn the rotary encoder to set the year, month, and day for the preset time.</li>
<li>The left Pico sends that data over UART to the right Pico.</li>
<li>When you press the time-travel button, the current time is saved in the “previous” time, and the time set as the target becomes your current time.</li>
<li>The present time starts ticking as your typical wall clock.</li>
</ul>
<h2 id="extra-fun-stuff-sound-and-memory">Extra fun stuff: sound and memory</h2>
<p>Because time travel needs sound effects (obviously), I hooked up a DFPlayer Mini. It plays MP3s from an SD card, and the Picos send the player commands over UART.</p>
<p>Oh, and the toy remembers your settings. Every second, the times are saved to flash memory. When you power it back on, the Picos pick up right where they left off. Handy, right?</p>
<p><img alt="DFPlayer Mini" loading="lazy" src="/blog/control_panel/dfplayer.webp">
DFPlayer Mini</p>
<h2 id="lessons-i-didnt-expect-to-learn">Lessons I didn’t expect to learn</h2>
<p>This project was a great learning experience but it wasn’t all smooth sailing. Here are some of the surprises I encountered:</p>
<ol>
<li>
<p><strong>Goroutines</strong>: In TinyGo, goroutines will keep going until they hit a blocking operation (like I/O). I had to add tiny sleep calls here and there to make everything work smoothly.</p>
</li>
<li>
<p><strong>Error handling matters</strong>: Error handling is always important, but it’s really important here. Things will go haywire if an extra character sneaks into your UART transmission. You must detect it and avoid just hanging up the whole time machine.</p>
</li>
<li>
<p><strong>Rotary encoders are not so simple</strong>: I learned about Gray code (have you heard about it?) to implement the rotary encoders, which saved me from a world of trouble. In short, it’s an alternative binary code where two consecutive numbers differ only by one bit. It turns out to be very useful in error-prone environments like electronic circuits. Read more: <a href="https://en.wikipedia.org/wiki/Gray_code">https://en.wikipedia.org/wiki/Gray_code</a>.</p>
</li>
<li>
<p><strong>Debouncing switches</strong>: That’s a whole other world of engineering. When you pressed a button, did it send just one pulse or more? If you’re going down that rabbit hole, here are some excellent reads on switch debouncing: <a href="https://www.eejournal.com/article/ultimate-guide-to-switch-debounce-part-1/">https://www.eejournal.com/article/ultimate-guide-to-switch-debounce-part-1/</a> and <a href="http://www.ganssle.com/debouncing.htm">http://www.ganssle.com/debouncing.htm</a>.</p>
</li>
</ol>
<h2 id="whats-next">What’s next?</h2>
<p>There are so many ways I can build on this project. For starters, I could turn this into a Pomodoro timer, an alarm clock, or a chess clock — time-related toys for all occasions.</p>
<p>Or, who knows? Maybe I’ll create a DIY kit so you can build your own time-travel toy at home.</p>
<h2 id="want-to-try-tinygo">Want to try TinyGo?</h2>
<p>If you’re curious about TinyGo and want to dip your toes in, here’s how to start:</p>
<ol>
<li>Get a Raspberry Pi Pico Starter Kit — it usually includes sensors, motors, LEDs, and more.</li>
<li>Clone the TinyGo repo: <a href="https://github.com/tinygo-org/tinygo">https://github.com/tinygo-org/tinygo</a></li>
<li>Follow the steps here: <a href="https://tinygo.org/getting-started/">https://tinygo.org/getting-started/</a></li>
</ol>
<p>Trust me, it’s a fun way to learn Go, and you’ll get something you can actually hold and show off to your friends.</p>
<p>Please let me know if you need more details about this project. Or if you have ideas about improving it — I’ll be happy to continue working on it!</p>
]]></content:encoded>
    </item>
    <item>
      <title>How to monitor SSH logins?</title>
      <link>https://pavelanni.dev/blog/ssh_login_monitor/</link>
      <pubDate>Thu, 27 Apr 2023 00:00:00 +0000</pubDate>
      <guid>https://pavelanni.dev/blog/ssh_login_monitor/</guid>
      <description>How a random chat at work brought me to writing an SSH login monitor program using LLM</description>
      <content:encoded><![CDATA[<p>I noticed that a friend of mine routinely logs into his lab servers via SSH using the <code>root</code> username and password.</p>
<p>&ldquo;Why do you do that?&rdquo; I asked.</p>
<p>&ldquo;What’s wrong with that?&rdquo; he said. &ldquo;I know, I know, it’s not a good security practice, but I’m used to it.
It’s just a lab server; what can go wrong? And also, all other ways are not that easy.&rdquo;</p>
<p>&ldquo;&lsquo;What can go wrong?&rsquo;&rdquo; I said, &ldquo;The famous last words!&rdquo;
&ldquo;I’m not going to tell you horror stories. I just think that what you consider &rsquo;the most convenient way&rsquo; is not
that convenient. There are other ways.&rdquo;</p>
<p>&ldquo;Yeah, I know, I know,&rdquo; he sighed. &ldquo;Create a normal user, give them <code>sudo</code> access, and all that.&rdquo;</p>
<p>&ldquo;Yes, that’s the <em>right way</em>. You are correct.
Even better, that user shouldn’t use a password too. Using SSH keys is much better.
But if you insist on going <em>directly</em> as <code>root</code>, you can do it with SSH keys too.
The good thing about this approach is that you can always check who’s logged in as <code>root</code> with which key.&rdquo;</p>
<p>&ldquo;Really? Can you show me?&rdquo; he asked.</p>
<p>Challenge accepted.</p>
<h2 id="create-users">Create users</h2>
<p>I used one of my Red Hat servers as a target host.
I decided to start a simple Fedora Linux VM for the client host and create three normal users on it.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>[pavel@fedora ~]$ sudo useradd -m alice
</span></span><span style="display:flex;"><span>[pavel@fedora ~]$ sudo useradd -m bob
</span></span><span style="display:flex;"><span>[pavel@fedora ~]$ sudo useradd -m charlie
</span></span></code></pre></div><h2 id="create-ssh-keys">Create SSH keys</h2>
<p>On behalf of each user, I created their SSH keys.
I decided to use the Ed25519 algorithm as it’s shorter and more secure than the default RSA.
(To learn more about this, just google &rsquo;ed25519 vs. rsa&rsquo;.)</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>[pavel@fedora ~] $ sudo su - alice
</span></span><span style="display:flex;"><span>[alice@fedora ~] $ ssh-keygen -t ed25519
</span></span><span style="display:flex;"><span>Generating public/private ed25519 key pair.
</span></span><span style="display:flex;"><span>Enter file in which to save the key (/home/alice/.ssh/id_ed25519):
</span></span><span style="display:flex;"><span>Created directory &#39;/home/alice/.ssh&#39;.
</span></span><span style="display:flex;"><span>Enter passphrase (empty for no passphrase):
</span></span><span style="display:flex;"><span>Enter same passphrase again:
</span></span><span style="display:flex;"><span>Your identification has been saved in /home/alice/.ssh/id_ed25519
</span></span><span style="display:flex;"><span>Your public key has been saved in /home/alice/.ssh/id_ed25519.pub
</span></span><span style="display:flex;"><span>The key fingerprint is:
</span></span><span style="display:flex;"><span>SHA256:5xuxPx8QnPv19/6IZ5frmQj1N0hRCP9J364ddE6avL8 alice@fedora
</span></span><span style="display:flex;"><span>The key&#39;s randomart image is:
</span></span><span style="display:flex;"><span>+--[ED25519 256]--+
</span></span><span style="display:flex;"><span>|           .. .. |
</span></span><span style="display:flex;"><span>|           ..o.  |
</span></span><span style="display:flex;"><span>|            +o . |
</span></span><span style="display:flex;"><span>|             o+ +|
</span></span><span style="display:flex;"><span>|        S o oo +*|
</span></span><span style="display:flex;"><span>|         o oo++Bo|
</span></span><span style="display:flex;"><span>|          +. .*+B|
</span></span><span style="display:flex;"><span>|           +o.+BX|
</span></span><span style="display:flex;"><span>|          . o**EX|
</span></span><span style="display:flex;"><span>+----[SHA256]-----+
</span></span><span style="display:flex;"><span>[alice@fedora ~]$ cat .ssh/id_ed25519.pub
</span></span><span style="display:flex;"><span>ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIG8Obx1FsUu1jlYDtzfEDHYSDjG82xE7ysxZVzhgpGC5 alice@fedora
</span></span><span style="display:flex;"><span>[alice@fedora ~] $ exit
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>[pavel@fedora ~] $ sudo su - bob
</span></span><span style="display:flex;"><span>[bob@fedora ~] $ ssh-keygen -t ed25519
</span></span><span style="display:flex;"><span>. . . . Same dialogue . . . .
</span></span><span style="display:flex;"><span>[bob@fedora ~] $ exit
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>[pavel@fedora ~] $ sudo su - charlie
</span></span><span style="display:flex;"><span>[charlie@fedora ~] $ ssh-keygen -t ed25519
</span></span><span style="display:flex;"><span>. . . . Same dialogue . . . .
</span></span><span style="display:flex;"><span>[charlie@fedora ~] $ exit
</span></span></code></pre></div><h2 id="create-fingerprints">Create fingerprints</h2>
<p>I wore my sysadmin hat and told my users: &ldquo;I trust you. I want to give you root access to my server.
But I need your public keys.&rdquo;</p>
<p>&ldquo;Great!&rdquo; Alice, Bob, and Charlie answered. &ldquo;How can we do it?&rdquo;</p>
<p>&ldquo;Login to your accounts.
Your public key is this file: <code>~/.ssh/id_ed25519.pub</code>.
It’s just a one-line text file.
You can include it in the mail body or attach it as a file.
Remember: don’t share your <em>private</em> key&ndash;the one without <code>.pub</code>&ndash;with anybody!
Keep it private!&rdquo;</p>
<p>My users started working, and in several minutes, I received an email from each of them containing the following information:</p>
<pre tabindex="0"><code class="language-none" data-lang="none">From: alice
To: sysadmin
Subject: my public key

Hi Sysadmin,

Here is my public key (I copied it from id_ed25519.pub, as you told us):

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIG8Obx1FsUu1jlYDtzfEDHYSDjG82xE7ysxZVzhgpGC5 alice@fedora

I hope this works.

Thanks,
Alice
</code></pre><h2 id="add-the-public-keys-to-the-host">Add the public keys to the host</h2>
<p>The easiest way to give access to somebody to any account, including <code>root</code>, is to add that user’s public key to the file <code>.ssh/authorized_keys</code> in that account’s home directory.
This is exactly what I did for the <code>root</code> user on my lab server.
I opened (with Vim, of course) the file <code>/root/.ssh/authorized_keys</code> and entered these three entries (the public keys from my users):</p>
<pre tabindex="0"><code class="language-none" data-lang="none">ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIG8Obx1FsUu1jlYDtzfEDHYSDjG82xE7ysxZVzhgpGC5 alice@fedora
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJgclT4eQ5RlYabZfkdjFV5wGrroXxmd5n2X7okmiaN8 bob@fedora
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJWcjljox2NKwDFllZ5KQc4LSVrBEKoaOE/t/up1XbyD charlie@fedora
</code></pre><p>Now the system is ready for a test.</p>
<h2 id="test-access">Test access</h2>
<p>I went to my users and told them: &ldquo;The system is ready. Feel free to test your access!
The first time you login, the system will ask you if you trust the host you are logging in.
Answer <code>yes</code>. The host will be added to the list of known hosts&ndash;check it later in <code>~/.ssh/known_hosts</code>&ndash;
and next time, you won’t be asked for confirmation.&rdquo;</p>
<p>Alice, Bob, and Charlie opened their terminals on the Fedora machine and tried:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>[bob@fedora ~] $ ssh -l root 192.168.1.234
</span></span><span style="display:flex;"><span>The authenticity of host &#39;192.168.1.234 (192.168.1.234)&#39; can&#39;t be established.
</span></span><span style="display:flex;"><span>ED25519 key fingerprint is SHA256:mhS0bPdGrEIwwMKJdKxpkxLdtYKNp0+FSgwqybeugd8.
</span></span><span style="display:flex;"><span>This key is not known by any other names
</span></span><span style="display:flex;"><span>Are you sure you want to continue connecting (yes/no/[fingerprint])? *(Bob typed &#39;yes&#39;)*
</span></span><span style="display:flex;"><span>Warning: Permanently added &#39;192.168.1.234&#39; (ED25519) to the list of known hosts.
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Last login: Wed Apr 26 09:06:21 2023 from 192.168.1.24
</span></span><span style="display:flex;"><span>[root@rhel-lab ~]#
</span></span></code></pre></div><p>&ldquo;Wow! That was easy!&rdquo; Bob said. &ldquo;Look, no password!&rdquo;</p>
<p>&ldquo;I told you!&rdquo; I said.
&ldquo;But keep in mind: each of you comes to the server with your own key.
That means the server’s admin will always know who logged in as root: Alice, Bob, or Charlie.
So please be considerate when working as root on this host.&rdquo;</p>
<p>I said this to my users but wasn’t ready yet to watch their logins.
It was time to prepare.</p>
<h2 id="check-the-logs">Check the logs</h2>
<p>&ldquo;They just logged in and out recently,&rdquo; I thought. &ldquo;It should be at the end of the log.&rdquo;</p>
<p>In Red Hat Enterprise Linux, the log file where all security-related events are stored is called <code>/var/log/secure.</code>
Let’s check its last 30 lines.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># tail -30  /var/log/secure
</span></span><span style="display:flex;"><span>Apr 27 10:21:19 deep-rh sshd[1337250]: Accepted publickey for root from 192.168.1.24 port 49090 ssh2: ED25519 SHA256:5xuxPx8QnPv19/6IZ5frmQj1N0hRCP9J364ddE6avL8
</span></span><span style="display:flex;"><span>Apr 27 10:21:19 deep-rh systemd[1337257]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
</span></span><span style="display:flex;"><span>Apr 27 10:21:19 deep-rh sshd[1337250]: pam_unix(sshd:session): session opened for user root by (uid=0)
</span></span><span style="display:flex;"><span>Apr 27 10:21:22 deep-rh sshd[1337282]: Received disconnect from 192.168.1.24 port 49090:11: disconnected by user
</span></span><span style="display:flex;"><span>Apr 27 10:21:22 deep-rh sshd[1337282]: Disconnected from user root 192.168.1.24 port 49090
</span></span><span style="display:flex;"><span>Apr 27 10:21:22 deep-rh sshd[1337250]: pam_unix(sshd:session): session closed for user root
</span></span><span style="display:flex;"><span>Apr 27 10:21:32 deep-rh systemd[1337261]: pam_unix(systemd-user:session): session closed for user root
</span></span><span style="display:flex;"><span>Apr 27 10:21:34 deep-rh sshd[1337458]: Accepted publickey for root from 192.168.1.24 port 41254 ssh2: ED25519 SHA256:is6l6bRqCCBVKunT+zVGHoUF0A06p8lt/04EoRbyCUY
</span></span><span style="display:flex;"><span>Apr 27 10:21:34 deep-rh systemd[1337467]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
</span></span><span style="display:flex;"><span>Apr 27 10:21:34 deep-rh sshd[1337458]: pam_unix(sshd:session): session opened for user root by (uid=0)
</span></span><span style="display:flex;"><span>Apr 27 10:21:37 deep-rh sshd[1337493]: Received disconnect from 192.168.1.24 port 41254:11: disconnected by user
</span></span><span style="display:flex;"><span>Apr 27 10:21:37 deep-rh sshd[1337493]: Disconnected from user root 192.168.1.24 port 41254
</span></span><span style="display:flex;"><span>Apr 27 10:21:37 deep-rh sshd[1337458]: pam_unix(sshd:session): session closed for user root
</span></span><span style="display:flex;"><span>Apr 27 10:21:47 deep-rh systemd[1337472]: pam_unix(systemd-user:session): session closed for user root
</span></span><span style="display:flex;"><span>Apr 27 10:21:55 deep-rh sshd[1337680]: Accepted publickey for root from 192.168.1.24 port 42552 ssh2: ED25519 SHA256:QgAov0UZI25hWxnbLiHa00j64/zD1m80UMsSIZtxr2s
</span></span><span style="display:flex;"><span>Apr 27 10:21:55 deep-rh systemd[1337706]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
</span></span><span style="display:flex;"><span>Apr 27 10:21:55 deep-rh sshd[1337680]: pam_unix(sshd:session): session opened for user root by (uid=0)
</span></span><span style="display:flex;"><span>Apr 27 10:21:58 deep-rh sshd[1337730]: Received disconnect from 192.168.1.24 port 42552:11: disconnected by user
</span></span><span style="display:flex;"><span>Apr 27 10:21:58 deep-rh sshd[1337730]: Disconnected from user root 192.168.1.24 port 42552
</span></span><span style="display:flex;"><span>Apr 27 10:21:58 deep-rh sshd[1337680]: pam_unix(sshd:session): session closed for user root
</span></span><span style="display:flex;"><span>Apr 27 10:22:08 deep-rh systemd[1337710]: pam_unix(systemd-user:session): session closed for user root
</span></span></code></pre></div><p>&ldquo;Good,&rdquo; I thought. &ldquo;I can see their logins and logouts. I can see the IPs from which they logged in.
But how can I figure out who logged in and when?&rdquo;</p>
<p>After a bit of googling, I found out that the string that goes after <code>ED25519 SHA256:</code> is a fingerprint of the user’s public key.
&ldquo;I just have to connect the fingerprints with the public keys,&rdquo; I thought.</p>
<h2 id="create-a-fingerprint-database">Create a fingerprint database</h2>
<p>Fingerprints are only useful if you have collected a good database of them.
This is what I did after receiving the emails from my users.</p>
<p>On the lab host (<code>rhel-lab</code>) I saved the users&rsquo; public keys in a separate directory under <code>/root</code>.
Of course, I made it readable only by root.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># mkdir ~/ssh-keys
</span></span><span style="display:flex;"><span># chmod <span style="color:#ae81ff">0700</span> ~/ssh-keys
</span></span><span style="display:flex;"><span># cd ~/ssh-keys
</span></span></code></pre></div><p>I copied the users&rsquo; public keys that they sent me here and added the owner’s name to each file.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># echo <span style="color:#e6db74">&#34;ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIG8Obx1FsUu1jlYDtzfEDHYSDjG82xE7ysxZVzhgpGC5 alice@fedora&#34;</span> &gt; alice_id_ed25519.pub
</span></span><span style="display:flex;"><span># echo <span style="color:#e6db74">&#34;ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJgclT4eQ5RlYabZfkdjFV5wGrroXxmd5n2X7okmiaN8 bob@fedora&#34;</span> &gt; bob_id_ed25519.pub
</span></span><span style="display:flex;"><span># echo <span style="color:#e6db74">&#34;ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJWcjljox2NKwDFllZ5KQc4LSVrBEKoaOE/t/up1XbyD charlie@fedora&#34;</span> &gt; charlie_id_ed25519.pub
</span></span><span style="display:flex;"><span># ls -l *pub
</span></span><span style="display:flex;"><span>-rw-r--r--. 1 root root 94 Apr 27 09:53 alice_id_ed25519.pub
</span></span><span style="display:flex;"><span>-rw-r--r--. 1 root root 92 Apr 27 09:54 bob_id_ed25519.pub
</span></span><span style="display:flex;"><span>-rw-r--r--. 1 root root 96 Apr 27 09:54 charlie_id_ed25519.pub
</span></span></code></pre></div><p>Then I ran the following command against each public key file to create its fingerprint.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># ssh-keygen -lf alice_id_ed25519.pub
</span></span><span style="display:flex;"><span>256 SHA256:5xuxPx8QnPv19/6IZ5frmQj1N0hRCP9J364ddE6avL8 alice@fedora (ED25519)
</span></span><span style="display:flex;"><span># ssh-keygen -lf bob_id_ed25519.pub
</span></span><span style="display:flex;"><span>256 SHA256:is6l6bRqCCBVKunT+zVGHoUF0A06p8lt/04EoRbyCUY bob@fedora (ED25519)
</span></span><span style="display:flex;"><span># ssh-keygen -lf charlie_id_ed25519.pub
</span></span><span style="display:flex;"><span>256 SHA256:QgAov0UZI25hWxnbLiHa00j64/zD1m80UMsSIZtxr2s charlie@fedora (ED25519)
</span></span></code></pre></div><p>In the same directory, I opened a file called <code>users.csv</code> and added three records in the form of <code>username,fingerprint</code>, like this:</p>
<p><strong>users.csv</strong></p>
<pre tabindex="0"><code class="language-none" data-lang="none">alice,5xuxPx8QnPv19/6IZ5frmQj1N0hRCP9J364ddE6avL8
bob,is6l6bRqCCBVKunT+zVGHoUF0A06p8lt/04EoRbyCUY
charlie,QgAov0UZI25hWxnbLiHa00j64/zD1m80UMsSIZtxr2s
</code></pre><p>Now I needed a program to scan the <code>/var/log/secure</code> file, find login and logout messages,
parse them to find the fingerprint, and look up the user based on their fingerprint in the database.</p>
<h2 id="create-a-log-monitoring-application">Create a log-monitoring application</h2>
<p>I started learning Go recently, so for each new idea I try to use Go to practice.
So this problem looked like a good exercise.</p>
<p>The program’s logic is pretty simple:</p>
<ul>
<li>Scan the log file and create a list of login/logout events.</li>
<li>For each login event, find the user based on their fingerprint.</li>
<li>Create a list of sessions and add login events to it.</li>
<li>For each logout event, find the corresponding login event based on the source IP and the port and update the end time of the session.</li>
<li>Output all sessions with user names, source IPs, start/end times, and duration.</li>
</ul>
<p>The most challenging part was to parse the log file and collect all necessary fields.
That’s why the regular expressions might look scary.</p>
<p>I created a simple Go program consisting of a single <code>main.go</code> file and tested it on
a short fragment of <code>/var/log/secure</code> file.
It printed out this:</p>
<pre tabindex="0"><code class="language-none" data-lang="none"># go run main.go
alice   192.168.1.24    2023-04-27 10:21:19     2023-04-27 10:21:22     3s
bob     192.168.1.24    2023-04-27 10:21:34     2023-04-27 10:21:37     3s
charlie 192.168.1.24    2023-04-27 10:21:55     2023-04-27 10:21:58     3s
</code></pre><h2 id="use-ai-to-improve-the-application">Use AI to improve the application</h2>
<p>The first version of this app was a simple <code>main.go</code> file with hard-coded file names.
I was playing around and needed a simple demo.
My first improvement was adding the command-line arguments.
I added the <code>pflag</code> package (<a href="https://pkg.go.dev/github.com/spf13/pflag">https://pkg.go.dev/github.com/spf13/pflag</a>) and turned on Codeium (<a href="https://codeium.com/">https://codeium.com/</a>) in my VS Code.
And here, AI began to help me.</p>
<p>AI coding assistants are very impressive, no doubt.
But it’s one thing when you see it helping <em>somebody</em> in the video or you’re trying it yourself with some <em>example</em> programs.
And it’s another thing when you write something yourself, you work on your own project, and it starts really helping <em>you</em>.
Then you can clearly see how much time you saved by not typing a lot of things (just press [Tab] to accept!),
by not looking around your own code (what should be included in this <code>struct</code>, I forgot?), and by not googling function library definitions and arguments.
AI remembers all this for you.</p>
<p>Back to my code. I just started typing <code>userDB := flag.</code> and Codeium already knew that it should be <code>StringP</code> and the argument
should be named <code>users</code> (short form is <code>u</code>) and the reasonable default should be <code>users.csv</code>.
I didn’t argue and accepted.
The next argument was the same: I added the <code>log</code> argument almost without typing anything.</p>
<p>So far, so good. Let’s try another tool.
I opened ChatGPT and asked:</p>
<p><strong>Me</strong>: Act as a Go programming mentor. I will give you a program I wrote. Please suggest possible tests to add to this program.
Here is my program:</p>
<p>&hellip;and I pasted my simple <code>main.go</code> in the chat window.</p>
<p>In the answer it suggested several cases that I have to test with each function: valid input,
empty input, invalid input, duplicate fingerprints, etc.
At the end, ChatGPT gave me an example of how it can be done and added:</p>
<p><strong>AI</strong>: You can follow a similar pattern to write tests for the other functions as well.</p>
<p>Wow, it acted like a real mentor! It didn’t write the code <em>for me</em>, but it helped me to move in the right direction.</p>
<p>I wanted to write my tests the right way and played a role of a good student:</p>
<p><strong>Me</strong>: I read an article that suggested keeping the <code>main.go</code> file small and let the main function only call the application function.
They suggested having other functions in separate files and argued that it helps in testing.
Can you help me to apply these suggestions to my code?</p>
<p>&ldquo;Sure!&rdquo; the AI answered and suggested a good plan of moving all my functions to a
separate <code>pkg/sshloginmonitor</code> directory and creating files <code>user.go</code>, <code>session.go</code>, and <code>util.go</code>.</p>
<p>I followed the suggestion, and our discussion continued.</p>
<p><strong>Me</strong>: My program should log a fatal error under certain conditions. How should I test that?</p>
<p>In the answer it explained that it’s possible but I should keep in mind that the call to <code>log.Fatal()</code> will terminate my test.</p>
<p><strong>Me</strong>: Right! I shouldn’t call <code>log.Fatal()</code> from the function. I should return an error instead. How should I check the if the error is returned?</p>
<p>The AI gave me the full explanation with an example of how it should be done.</p>
<p><strong>Me</strong>: How should I specify the expected error in the lists of tests?</p>
<p>Another great example with a slice of test cases showing how to specify the expected error.</p>
<p><strong>Me</strong>: How should I test reading from a file? Can it be done by reading from a string constant?</p>
<p>Another great suggestion from AI: you probably should pass <code>io.Reader</code> to your function, not a file name.
That way, it will be much easier to test.
Accepted; I re-wrote my functions to use <code>io.Reader</code> instead of file names.</p>
<p>And so on, and so forth. Step by step, with the help of ChatGPT and Codeium, my little program
got the tests it needed, docstrings for functions, and test cases for different conditions.
In other words, in just a couple of hours, it looked much more professional.</p>
<p>I don’t know if AI can fully replace programmers.
But I’m sure it can help us write better code.
Just don’t be afraid and ask questions.</p>
<p>Find the code in this repo: <a href="https://github.com/pavelanni/ssh-login-monitor">https://github.com/pavelanni/ssh-login-monitor</a></p>
<hr>
<p>&ldquo;Wait,&rdquo; I thought. &ldquo;What if I give the AI the full description of my problem?
Will it be able to write it from scratch?&rdquo;</p>
<p>To be honest, I was a bit skeptical. Well, ChatGPT has impressed me already helping with my code here and there.
But to solve this problem from scratch, just from the problem description?
Probably not. But let’s give it a try.</p>
<h2 id="chatgpt-solves-the-problem">ChatGPT solves the problem</h2>
<p>I opened the ChatGPT window and typed the problem description.</p>
<p><img alt="The problem description" loading="lazy" src="/blog/ssh_login_monitor/problem.png"></p>
<p>I added the log (here is just a fragment).</p>
<p><img alt="Log fragment" loading="lazy" src="/blog/ssh_login_monitor/log.png"></p>
<p>And finally I added the <code>authorized_keys</code> file.</p>
<p><img alt="Authorized keys" loading="lazy" src="/blog/ssh_login_monitor/keys.png"></p>
<p>Let’s see what it can do with such a problem!</p>
<p>I didn’t wait for too long.
Almost immediately, ChatGPT started printing.
(The GPT-4 version prints a bit slower that GPT-3.5 and that creates an effect of &ldquo;thinking&rdquo;.
Also, it reminds me those old teletype machines used with <em>really</em> old computers.)</p>
<p><img alt="Fingerprint calculation" loading="lazy" src="/blog/ssh_login_monitor/fingerprint_calc.png"></p>
<p>Wait, what?? It’s just a language model!
How could it calculate the fingerprints??</p>
<p>But I didn’t have time to answer my own question because ChatGPT continued printing.</p>
<p><img alt="Login events" loading="lazy" src="/blog/ssh_login_monitor/login_events.png"></p>
<p>Well, it found the login events based on the string I gave it (him? her?) and connected the fingerprints to those it just calculated.
Impressive. It even found the timestamps and correctly presented them as timestamps.
Good job, but that’s easy.</p>
<p>Let’s continue and ask about logout events.</p>
<p><img alt="Logout events problem" loading="lazy" src="/blog/ssh_login_monitor/logout_events_problem.png"></p>
<p>Again, almost without a pause:</p>
<p><img alt="Logout events" loading="lazy" src="/blog/ssh_login_monitor/logout_events.png"></p>
<p>Good logic, great explanation! Find the ports and connect them to the login events.
That means it remembers the login events from the previous task somehow!
Mind blowing&hellip; But let’s continue.</p>
<p>If it remembers login and logout events, it should be able to calculate session durations.
Or not? Again, it’s just a language model, it should not know how to do arithmetics.
Let’s ask and see&hellip;</p>
<p><img alt="Session durations" loading="lazy" src="/blog/ssh_login_monitor/session_durations.png"></p>
<p>Wow! It even explained how it did the calculations! &ldquo;Show your work,&rdquo; as we were told in school.</p>
<p>I found myself sitting with a dropped jaw a few seconds later.
(No, it’s not just a figure of speech. Literally.)</p>
<p>For a few minutes I couldn’t collect my thoughts.
Yes, it’s a language model. Yes, it can find certain phrases and connect them together because
it has seen those phrases and words many times during training.
I understand that.</p>
<p>But how can it find numbers (like port numbers) and connect them together?
And how can it calculate?
Not only something simple like &ldquo;37 - 34&rdquo;, but an <em>SSH public key fingerprint</em>??
I can’t imagine that the model was trained on all possible public keys and their fingerprints, can you?</p>
<p>After several minutes of shock I got another great idea.
I had to close the loop.</p>
<p><img alt="Write a program" loading="lazy" src="/blog/ssh_login_monitor/write_program.png"></p>
<p>It wrote a piece of Go code, gave me instructions on how to run it, and how to pass
the input files to it.</p>
<p>Needless to say that I copied the code into my editor and ran it!</p>
<pre tabindex="0"><code class="language-none" data-lang="none">$ go run main.go ../test/secure.log ../test/authorized_keys
Login: alice - 0000-04-27 10:21:19 - 192.168.1.24:49090
Logout: alice - 0000-04-27 10:21:22 - 192.168.1.24:49090
Login: bob - 0000-04-27 10:21:34 - 192.168.1.24:41254
Logout: bob - 0000-04-27 10:21:37 - 192.168.1.24:41254
Login: charlie - 0000-04-27 10:21:55 - 192.168.1.24:42552
Logout: charlie - 0000-04-27 10:21:58 - 192.168.1.24:42552
</code></pre><p>One minor thing &ndash; it didn’t get the current year.
But it wasn’t in the log, so this is fine.
Now I’m pretty sure I could tell it to use the current year if it’s missing and it would do it perfectly.
No doubt.</p>
<p>The code written by ChatGPT is here: <a href="https://github.com/pavelanni/ssh-login-monitor/tree/main/chatgpt-version">https://github.com/pavelanni/ssh-login-monitor/tree/main/chatgpt-version</a></p>
]]></content:encoded>
    </item>
  </channel>
</rss>
