aboutsummaryrefslogtreecommitdiff
path: root/posts
diff options
context:
space:
mode:
Diffstat (limited to 'posts')
-rw-r--r--posts/awk-with-kakoune.njk151
1 files changed, 151 insertions, 0 deletions
diff --git a/posts/awk-with-kakoune.njk b/posts/awk-with-kakoune.njk
new file mode 100644
index 0000000..ccf4b1a
--- /dev/null
+++ b/posts/awk-with-kakoune.njk
@@ -0,0 +1,151 @@
+---
+layout: post.njk
+title: Using Awk to Enable Kakoune to Generate Types for OCaml
+tags: post
+date: 2024-03-03
+---
+<p>Lately I have been using the Kakoune text editor for my editing. What
+absolutely blows my mind is how interoperable it is with Unix scripts
+and how easy it is to create extensions.</p>
+<p>I am working on a Web server in OCaml. In my work, I levarage a
+library called <a
+href="https://github.com/roddyyaga/ppx_rapper">ppx_rapper</a>, which
+allows you to write SQL queries which are then converted to OCaml types. So a select query looks something like
+this:</p>
+<div class="sourceCode" id="cb1"><pre
+class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> my_query =</span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> [%rapper</span>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> get_opt</span>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> {sql|</span>
+<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}</span>
+<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> FROM users</span>
+<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> WHERE username &lt;&gt; %<span class="dt">string</span>{wrong_user} AND id &gt; %<span class="dt">int</span>{min_id}</span>
+<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> |sql}]</span></code></pre></div>
+<p><code>my_query</code> becomes a function that takes an
+<code>int</code> and <code>sql_connection</code> and returns an n-tuple,
+<code>(int,string,bool,string)</code>. This can be hard to work with, especially a larger query can return many values of the same type and it becomes easy to mix them up. If you have a record type defined
+that looks exactly the same as the return type of the SQL statement, you
+can therefore add a <code>record_out</code> statement, which ppx_rapper will handle and
+automatically return a record that matches its structure instead of a tuple:</p>
+<div class="sourceCode" id="cb2"><pre
+class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> my_query_result = </span>
+<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> { id : <span class="dt">int</span></span>
+<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> ; username : <span class="dt">string</span></span>
+<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> ; following : <span class="dt">bool</span></span>
+<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> ; bio : <span class="dt">string</span> <span class="dt">option</span></span>
+<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> }</span>
+<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> my_query =</span>
+<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> [%rapper</span>
+<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> get_opt</span>
+<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> {sql|</span>
+<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a> SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}</span>
+<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a> FROM users</span>
+<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a> WHERE username &lt;&gt; %<span class="dt">string</span>{wrong_user} AND id &gt; %<span class="dt">int</span>{min_id}</span>
+<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a> |sql} record_out] <span class="co">(* notice record out *)</span></span></code></pre></div>
+<p>Now <code>my_query</code> will return a typed record
+<code>my_query_result</code> instead of a tuple. Records are easier to work with; they
+help you avoid mixing up the values and work as self-documentation.</p>
+<p>Of course, writing these types out by hand quickly becomes tedious, so yesterday I conjured up an Awk script
+that can be used to generate the types for you.</p>
+<div class="sourceCode" id="cb3"><pre
+class="sourceCode awk"><code class="sourceCode awk"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co">#!/usr/bin/env awk -f</span></span>
+<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="cf">BEGIN</span> <span class="op">{</span></span>
+<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> inside_rapper <span class="op">=</span> <span class="dv">0</span></span>
+<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> statement_name <span class="op">=</span> <span class="st">&quot;&quot;</span></span>
+<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> add_bracket <span class="op">=</span> <span class="dv">0</span></span>
+<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
+<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Extract function name so that we can use that</span></span>
+<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="co"># for the type name</span></span>
+<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a><span class="ot">/</span><span class="ss">let </span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">zA</span><span class="ot">-</span><span class="ss">Z0</span><span class="ot">-</span><span class="ss">9_</span><span class="ot">]+</span><span class="ss"> =</span><span class="ot">/</span> <span class="op">&amp;&amp;</span> inside_rapper <span class="op">==</span> <span class="dv">0</span> <span class="op">{</span></span>
+<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> statement_name <span class="op">=</span> <span class="dt">$2</span></span>
+<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
+<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a><span class="co"># If [%rapper, begin extraction</span></span>
+<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a><span class="ot">/\</span><span class="sc">[</span><span class="ss">%rapper</span><span class="ot">/</span> <span class="op">&amp;&amp;</span> inside_rapper <span class="op">==</span> <span class="dv">0</span> <span class="op">{</span></span>
+<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> inside_rapper <span class="op">=</span> <span class="dv">1</span></span>
+<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> add_bracket <span class="op">=</span> <span class="dv">1</span></span>
+<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">printf</span> <span class="st">&quot;type &quot;</span> statement_name <span class="st">&quot;_result = </span><span class="sc">\n</span><span class="st">&quot;</span></span>
+<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
+<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="co"># Match patterns like @int{string}</span></span>
+<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a>inside_rapper <span class="op">==</span> <span class="dv">1</span> <span class="op">&amp;&amp;</span> <span class="op">/@</span>[a<span class="op">-</span>z<span class="op">?</span>]<span class="op">+{</span>[a<span class="op">-</span>z_0<span class="op">-</span><span class="dv">9</span>]<span class="op">+}/</span>g <span class="op">{</span></span>
+<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a> from <span class="op">=</span> <span class="dv">0</span></span>
+<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a> <span class="co"># Remove everything before &quot;@&quot; and after &quot;}&quot;</span></span>
+<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a> pos <span class="op">=</span> <span class="fu">match</span> <span class="op">(</span><span class="dt">$0</span><span class="op">,</span> <span class="ot">/</span><span class="ss">@</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z?</span><span class="ot">]+</span><span class="ss">{</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z_0</span><span class="ot">-</span><span class="ss">9</span><span class="ot">]+</span><span class="ss">}</span><span class="ot">/</span><span class="op">,</span> val<span class="op">)</span></span>
+<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> <span class="op">(</span><span class="dv">0</span> <span class="op">&lt;</span> pos<span class="op">)</span> <span class="op">{</span></span>
+<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a> <span class="co"># Split the line based on curly braces</span></span>
+<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a> <span class="fu">split</span><span class="op">(</span>val[<span class="dv">0</span>]<span class="op">,</span> parts<span class="op">,</span> <span class="ot">/</span><span class="ss">{</span><span class="ot">/</span><span class="op">)</span></span>
+<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a> <span class="co"># Extract @type, located in parts[1]</span></span>
+<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a> <span class="co"># Use substr to remove the @</span></span>
+<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a> type <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>parts[<span class="dv">1</span>]<span class="op">,</span> <span class="dv">2</span><span class="op">)</span></span>
+<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a> <span class="co"># Remove the last character (}) of key</span></span>
+<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a> key <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>parts[<span class="dv">2</span>]<span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="fu">length</span><span class="op">(</span>parts[<span class="dv">2</span>]<span class="op">)-</span><span class="dv">1</span><span class="op">)</span></span>
+<span id="cb3-38"><a href="#cb3-38" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-39"><a href="#cb3-39" aria-hidden="true" tabindex="-1"></a> <span class="co"># ? = optional, so convert that to option</span></span>
+<span id="cb3-40"><a href="#cb3-40" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span><span class="fu">substr</span><span class="op">(</span>type<span class="op">,</span> <span class="fu">length</span><span class="op">(</span>type<span class="op">))</span> <span class="op">==</span> <span class="st">&quot;?&quot;</span><span class="op">)</span> <span class="op">{</span></span>
+<span id="cb3-41"><a href="#cb3-41" aria-hidden="true" tabindex="-1"></a> type <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>type<span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="fu">length</span><span class="op">(</span>type<span class="op">)</span> <span class="op">-</span> <span class="dv">1</span><span class="op">)</span> <span class="st">&quot; option&quot;</span></span>
+<span id="cb3-42"><a href="#cb3-42" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
+<span id="cb3-43"><a href="#cb3-43" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-44"><a href="#cb3-44" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-45"><a href="#cb3-45" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>add_bracket <span class="op">==</span> <span class="dv">1</span><span class="op">)</span> <span class="op">{</span></span>
+<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a> <span class="kw">print</span> <span class="st">&quot; { &quot;</span> key <span class="st">&quot; : &quot;</span> type</span>
+<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a> add_bracket <span class="op">=</span> <span class="dv">0</span></span>
+<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
+<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> <span class="cf">else</span> <span class="op">{</span></span>
+<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a> <span class="co"># Print the key</span></span>
+<span id="cb3-51"><a href="#cb3-51" aria-hidden="true" tabindex="-1"></a> <span class="kw">print</span> <span class="st">&quot; ; &quot;</span> key <span class="st">&quot; : &quot;</span> type</span>
+<span id="cb3-52"><a href="#cb3-52" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
+<span id="cb3-53"><a href="#cb3-53" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-54"><a href="#cb3-54" aria-hidden="true" tabindex="-1"></a> <span class="co"># In case there is more than 1 key per line, make sure we get all of them.</span></span>
+<span id="cb3-55"><a href="#cb3-55" aria-hidden="true" tabindex="-1"></a> <span class="co"># Awk matches on the first occurance though, so we need to remove the</span></span>
+<span id="cb3-56"><a href="#cb3-56" aria-hidden="true" tabindex="-1"></a> <span class="co"># previous match from the string and run again.</span></span>
+<span id="cb3-57"><a href="#cb3-57" aria-hidden="true" tabindex="-1"></a> from <span class="op">+=</span> pos <span class="op">+</span> val[<span class="dv">0</span><span class="op">,</span> <span class="st">&quot;length&quot;</span>]</span>
+<span id="cb3-58"><a href="#cb3-58" aria-hidden="true" tabindex="-1"></a> pos <span class="op">=</span> <span class="fu">match</span><span class="op">(</span> <span class="fu">substr</span><span class="op">(</span> <span class="dt">$0</span><span class="op">,</span> from <span class="op">),</span> <span class="ot">/</span><span class="ss">@</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z?</span><span class="ot">]+</span><span class="ss">{</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z_0</span><span class="ot">-</span><span class="ss">9</span><span class="ot">]+</span><span class="ss">}</span><span class="ot">/</span><span class="op">,</span> val <span class="op">)</span></span>
+<span id="cb3-59"><a href="#cb3-59" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
+<span id="cb3-60"><a href="#cb3-60" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
+<span id="cb3-61"><a href="#cb3-61" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb3-62"><a href="#cb3-62" aria-hidden="true" tabindex="-1"></a><span class="co"># If we see |sql], that means we are done. </span></span>
+<span id="cb3-63"><a href="#cb3-63" aria-hidden="true" tabindex="-1"></a><span class="ot">/\</span><span class="sc">|</span><span class="ss">sql}</span><span class="ot">/</span> <span class="op">&amp;&amp;</span> inside_rapper <span class="op">==</span> <span class="dv">1</span> <span class="op">{</span></span>
+<span id="cb3-64"><a href="#cb3-64" aria-hidden="true" tabindex="-1"></a> <span class="kw">printf</span> <span class="st">&quot; }</span><span class="sc">\n</span><span class="st">&quot;</span></span>
+<span id="cb3-65"><a href="#cb3-65" aria-hidden="true" tabindex="-1"></a> inside_rapper <span class="op">=</span> <span class="dv">0</span></span>
+<span id="cb3-66"><a href="#cb3-66" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
+<p>The <code>my_query_result</code> type you see above is generated with this script. I will not go into detail how the script work, but I have added comments explaining the steps. I chose Awk because it is specifically built for simple line
+manipulation, a perfect fit for this problem.</p>
+<p>You can run the script above with</p>
+<pre><code>chmod +x my-awk-script
+./my-awk-script file_with_my_query.ml</code></pre>
+<p>I used the <a href="https://ferd.ca/awk-in-20-minutes.html">Awk in 20
+Minutes tutorial</a> to learn how to write Awk scripts, and levaraged
+that to write the code above. The syntax feels foreign at first but the language is super easy to use.</p>
+<h2 id="using-it-in-kakoune">Using it in Kakoune</h2>
+<p>Already we have a CLI tool, but ideally I would like to use it within
+my editor. This is where Kakoune makes it so easy.</p>
+<p>Kakoune allows you to run any POSIX script, pipe the selected input,
+and have the output printed in the editor. So I do not even have to
+write a plugin to use it with the editor (!).</p>
+<p>Allow me to demonstrate: <a href="https://asciinema.org/a/644993">https://asciinema.org/a/644993</a>.</p>
+<p>In this video, I show how, without even creating any plugin, I could
+use my script in my editing flow to generate the OCaml types. All I had
+to do was:</p>
+<ol type="1">
+<li>Select the let statement</li>
+<li>Press &lt;!&gt;</li>
+<li>Run the awk script above, which I have named
+ocaml-rapper-helper.</li>
+</ol>
+<p>You can easily levarage this with many other tools. For example, you
+can use <a href="https://github.com/verdverm/chatgpt">Chatgpt CLI</a> or <a href="https://ollama.com/">Ollama</a> to
+have an LLM autocomplete or explain code for you, similar to codepilot but free.</p>
+<p>I hope this demonstrates why I am so in love with this editor. It
+comes with a bunch of sane standards OOB, a great LSP integration, but
+it is also so easy to extend and add your own functionality. Even if it
+does not have many plugins on its own due to a smaller community, it
+interops so well with other UNIX tools that it makes up for it. Had I
+wanted to write some more complex text manipulation, I could have used a language
+like Rust, Go or OCaml that compiles down to a single binary and use
+that instead.</p>