diff options
-rw-r--r-- | posts/awk-with-kakoune.njk | 151 |
1 files changed, 151 insertions, 0 deletions
diff --git a/posts/awk-with-kakoune.njk b/posts/awk-with-kakoune.njk new file mode 100644 index 0000000..ccf4b1a --- /dev/null +++ b/posts/awk-with-kakoune.njk @@ -0,0 +1,151 @@ +--- +layout: post.njk +title: Using Awk to Enable Kakoune to Generate Types for OCaml +tags: post +date: 2024-03-03 +--- +<p>Lately I have been using the Kakoune text editor for my editing. What +absolutely blows my mind is how interoperable it is with Unix scripts +and how easy it is to create extensions.</p> +<p>I am working on a Web server in OCaml. In my work, I levarage a +library called <a +href="https://github.com/roddyyaga/ppx_rapper">ppx_rapper</a>, which +allows you to write SQL queries which are then converted to OCaml types. So a select query looks something like +this:</p> +<div class="sourceCode" id="cb1"><pre +class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> my_query =</span> +<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> [%rapper</span> +<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> get_opt</span> +<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> {sql|</span> +<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}</span> +<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> FROM users</span> +<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> WHERE username <> %<span class="dt">string</span>{wrong_user} AND id > %<span class="dt">int</span>{min_id}</span> +<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> |sql}]</span></code></pre></div> +<p><code>my_query</code> becomes a function that takes an +<code>int</code> and <code>sql_connection</code> and returns an n-tuple, +<code>(int,string,bool,string)</code>. This can be hard to work with, especially a larger query can return many values of the same type and it becomes easy to mix them up. If you have a record type defined +that looks exactly the same as the return type of the SQL statement, you +can therefore add a <code>record_out</code> statement, which ppx_rapper will handle and +automatically return a record that matches its structure instead of a tuple:</p> +<div class="sourceCode" id="cb2"><pre +class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> my_query_result = </span> +<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> { id : <span class="dt">int</span></span> +<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> ; username : <span class="dt">string</span></span> +<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> ; following : <span class="dt">bool</span></span> +<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> ; bio : <span class="dt">string</span> <span class="dt">option</span></span> +<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> }</span> +<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> my_query =</span> +<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> [%rapper</span> +<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> get_opt</span> +<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> {sql|</span> +<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a> SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}</span> +<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a> FROM users</span> +<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a> WHERE username <> %<span class="dt">string</span>{wrong_user} AND id > %<span class="dt">int</span>{min_id}</span> +<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a> |sql} record_out] <span class="co">(* notice record out *)</span></span></code></pre></div> +<p>Now <code>my_query</code> will return a typed record +<code>my_query_result</code> instead of a tuple. Records are easier to work with; they +help you avoid mixing up the values and work as self-documentation.</p> +<p>Of course, writing these types out by hand quickly becomes tedious, so yesterday I conjured up an Awk script +that can be used to generate the types for you.</p> +<div class="sourceCode" id="cb3"><pre +class="sourceCode awk"><code class="sourceCode awk"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co">#!/usr/bin/env awk -f</span></span> +<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="cf">BEGIN</span> <span class="op">{</span></span> +<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> inside_rapper <span class="op">=</span> <span class="dv">0</span></span> +<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> statement_name <span class="op">=</span> <span class="st">""</span></span> +<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> add_bracket <span class="op">=</span> <span class="dv">0</span></span> +<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> +<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Extract function name so that we can use that</span></span> +<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="co"># for the type name</span></span> +<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a><span class="ot">/</span><span class="ss">let </span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">zA</span><span class="ot">-</span><span class="ss">Z0</span><span class="ot">-</span><span class="ss">9_</span><span class="ot">]+</span><span class="ss"> =</span><span class="ot">/</span> <span class="op">&&</span> inside_rapper <span class="op">==</span> <span class="dv">0</span> <span class="op">{</span></span> +<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> statement_name <span class="op">=</span> <span class="dt">$2</span></span> +<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> +<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a><span class="co"># If [%rapper, begin extraction</span></span> +<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a><span class="ot">/\</span><span class="sc">[</span><span class="ss">%rapper</span><span class="ot">/</span> <span class="op">&&</span> inside_rapper <span class="op">==</span> <span class="dv">0</span> <span class="op">{</span></span> +<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> inside_rapper <span class="op">=</span> <span class="dv">1</span></span> +<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> add_bracket <span class="op">=</span> <span class="dv">1</span></span> +<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">printf</span> <span class="st">"type "</span> statement_name <span class="st">"_result = </span><span class="sc">\n</span><span class="st">"</span></span> +<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> +<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="co"># Match patterns like @int{string}</span></span> +<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a>inside_rapper <span class="op">==</span> <span class="dv">1</span> <span class="op">&&</span> <span class="op">/@</span>[a<span class="op">-</span>z<span class="op">?</span>]<span class="op">+{</span>[a<span class="op">-</span>z_0<span class="op">-</span><span class="dv">9</span>]<span class="op">+}/</span>g <span class="op">{</span></span> +<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a> from <span class="op">=</span> <span class="dv">0</span></span> +<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a> <span class="co"># Remove everything before "@" and after "}"</span></span> +<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a> pos <span class="op">=</span> <span class="fu">match</span> <span class="op">(</span><span class="dt">$0</span><span class="op">,</span> <span class="ot">/</span><span class="ss">@</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z?</span><span class="ot">]+</span><span class="ss">{</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z_0</span><span class="ot">-</span><span class="ss">9</span><span class="ot">]+</span><span class="ss">}</span><span class="ot">/</span><span class="op">,</span> val<span class="op">)</span></span> +<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> <span class="op">(</span><span class="dv">0</span> <span class="op"><</span> pos<span class="op">)</span> <span class="op">{</span></span> +<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a> <span class="co"># Split the line based on curly braces</span></span> +<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a> <span class="fu">split</span><span class="op">(</span>val[<span class="dv">0</span>]<span class="op">,</span> parts<span class="op">,</span> <span class="ot">/</span><span class="ss">{</span><span class="ot">/</span><span class="op">)</span></span> +<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a> <span class="co"># Extract @type, located in parts[1]</span></span> +<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a> <span class="co"># Use substr to remove the @</span></span> +<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a> type <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>parts[<span class="dv">1</span>]<span class="op">,</span> <span class="dv">2</span><span class="op">)</span></span> +<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a> <span class="co"># Remove the last character (}) of key</span></span> +<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a> key <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>parts[<span class="dv">2</span>]<span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="fu">length</span><span class="op">(</span>parts[<span class="dv">2</span>]<span class="op">)-</span><span class="dv">1</span><span class="op">)</span></span> +<span id="cb3-38"><a href="#cb3-38" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-39"><a href="#cb3-39" aria-hidden="true" tabindex="-1"></a> <span class="co"># ? = optional, so convert that to option</span></span> +<span id="cb3-40"><a href="#cb3-40" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span><span class="fu">substr</span><span class="op">(</span>type<span class="op">,</span> <span class="fu">length</span><span class="op">(</span>type<span class="op">))</span> <span class="op">==</span> <span class="st">"?"</span><span class="op">)</span> <span class="op">{</span></span> +<span id="cb3-41"><a href="#cb3-41" aria-hidden="true" tabindex="-1"></a> type <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>type<span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="fu">length</span><span class="op">(</span>type<span class="op">)</span> <span class="op">-</span> <span class="dv">1</span><span class="op">)</span> <span class="st">" option"</span></span> +<span id="cb3-42"><a href="#cb3-42" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span> +<span id="cb3-43"><a href="#cb3-43" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-44"><a href="#cb3-44" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-45"><a href="#cb3-45" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>add_bracket <span class="op">==</span> <span class="dv">1</span><span class="op">)</span> <span class="op">{</span></span> +<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a> <span class="kw">print</span> <span class="st">" { "</span> key <span class="st">" : "</span> type</span> +<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a> add_bracket <span class="op">=</span> <span class="dv">0</span></span> +<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span> +<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> <span class="cf">else</span> <span class="op">{</span></span> +<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a> <span class="co"># Print the key</span></span> +<span id="cb3-51"><a href="#cb3-51" aria-hidden="true" tabindex="-1"></a> <span class="kw">print</span> <span class="st">" ; "</span> key <span class="st">" : "</span> type</span> +<span id="cb3-52"><a href="#cb3-52" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span> +<span id="cb3-53"><a href="#cb3-53" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-54"><a href="#cb3-54" aria-hidden="true" tabindex="-1"></a> <span class="co"># In case there is more than 1 key per line, make sure we get all of them.</span></span> +<span id="cb3-55"><a href="#cb3-55" aria-hidden="true" tabindex="-1"></a> <span class="co"># Awk matches on the first occurance though, so we need to remove the</span></span> +<span id="cb3-56"><a href="#cb3-56" aria-hidden="true" tabindex="-1"></a> <span class="co"># previous match from the string and run again.</span></span> +<span id="cb3-57"><a href="#cb3-57" aria-hidden="true" tabindex="-1"></a> from <span class="op">+=</span> pos <span class="op">+</span> val[<span class="dv">0</span><span class="op">,</span> <span class="st">"length"</span>]</span> +<span id="cb3-58"><a href="#cb3-58" aria-hidden="true" tabindex="-1"></a> pos <span class="op">=</span> <span class="fu">match</span><span class="op">(</span> <span class="fu">substr</span><span class="op">(</span> <span class="dt">$0</span><span class="op">,</span> from <span class="op">),</span> <span class="ot">/</span><span class="ss">@</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z?</span><span class="ot">]+</span><span class="ss">{</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z_0</span><span class="ot">-</span><span class="ss">9</span><span class="ot">]+</span><span class="ss">}</span><span class="ot">/</span><span class="op">,</span> val <span class="op">)</span></span> +<span id="cb3-59"><a href="#cb3-59" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span> +<span id="cb3-60"><a href="#cb3-60" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> +<span id="cb3-61"><a href="#cb3-61" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb3-62"><a href="#cb3-62" aria-hidden="true" tabindex="-1"></a><span class="co"># If we see |sql], that means we are done. </span></span> +<span id="cb3-63"><a href="#cb3-63" aria-hidden="true" tabindex="-1"></a><span class="ot">/\</span><span class="sc">|</span><span class="ss">sql}</span><span class="ot">/</span> <span class="op">&&</span> inside_rapper <span class="op">==</span> <span class="dv">1</span> <span class="op">{</span></span> +<span id="cb3-64"><a href="#cb3-64" aria-hidden="true" tabindex="-1"></a> <span class="kw">printf</span> <span class="st">" }</span><span class="sc">\n</span><span class="st">"</span></span> +<span id="cb3-65"><a href="#cb3-65" aria-hidden="true" tabindex="-1"></a> inside_rapper <span class="op">=</span> <span class="dv">0</span></span> +<span id="cb3-66"><a href="#cb3-66" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div> +<p>The <code>my_query_result</code> type you see above is generated with this script. I will not go into detail how the script work, but I have added comments explaining the steps. I chose Awk because it is specifically built for simple line +manipulation, a perfect fit for this problem.</p> +<p>You can run the script above with</p> +<pre><code>chmod +x my-awk-script +./my-awk-script file_with_my_query.ml</code></pre> +<p>I used the <a href="https://ferd.ca/awk-in-20-minutes.html">Awk in 20 +Minutes tutorial</a> to learn how to write Awk scripts, and levaraged +that to write the code above. The syntax feels foreign at first but the language is super easy to use.</p> +<h2 id="using-it-in-kakoune">Using it in Kakoune</h2> +<p>Already we have a CLI tool, but ideally I would like to use it within +my editor. This is where Kakoune makes it so easy.</p> +<p>Kakoune allows you to run any POSIX script, pipe the selected input, +and have the output printed in the editor. So I do not even have to +write a plugin to use it with the editor (!).</p> +<p>Allow me to demonstrate: <a href="https://asciinema.org/a/644993">https://asciinema.org/a/644993</a>.</p> +<p>In this video, I show how, without even creating any plugin, I could +use my script in my editing flow to generate the OCaml types. All I had +to do was:</p> +<ol type="1"> +<li>Select the let statement</li> +<li>Press <!></li> +<li>Run the awk script above, which I have named +ocaml-rapper-helper.</li> +</ol> +<p>You can easily levarage this with many other tools. For example, you +can use <a href="https://github.com/verdverm/chatgpt">Chatgpt CLI</a> or <a href="https://ollama.com/">Ollama</a> to +have an LLM autocomplete or explain code for you, similar to codepilot but free.</p> +<p>I hope this demonstrates why I am so in love with this editor. It +comes with a bunch of sane standards OOB, a great LSP integration, but +it is also so easy to extend and add your own functionality. Even if it +does not have many plugins on its own due to a smaller community, it +interops so well with other UNIX tools that it makes up for it. Had I +wanted to write some more complex text manipulation, I could have used a language +like Rust, Go or OCaml that compiles down to a single binary and use +that instead.</p> |