aboutsummaryrefslogtreecommitdiff
path: root/posts/awk-with-kakoune.njk
blob: 82428c60ae6cc44d2e149404e6c2de1f1a2b1e0a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
layout: post.njk
title: Using Awk with Kakoune to Generate SQL Types for OCaml
tags: post 
date: 2024-03-03
---
<p>Lately I have been using the Kakoune text editor for my editing. What
blows my mind is how interoperable the editor is with Unix scripts
and how easy it is to create extensions.</p>
<p>So I am working on a Web server in OCaml. In my work, I levarage a
library called <a
href="https://github.com/roddyyaga/ppx_rapper">ppx_rapper</a>, which
allows you to write SQL queries which are then converted to OCaml types. So a select query looks something like
this:</p>
<div class="sourceCode" id="cb1"><pre
class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> my_query =</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>  [%rapper</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>    get_opt</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>      {sql|</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>      SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>      FROM users</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>      WHERE username &lt;&gt; %<span class="dt">string</span>{wrong_user} AND id &gt; %<span class="dt">int</span>{min_id}</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>      |sql}]</span></code></pre></div>
<p><code>my_query</code> becomes a function that takes an
<code>int</code> and <code>sql_connection</code> and returns an n-tuple,
<code>(int,string,bool,string)</code>. This can be hard to work with, especially a larger query can return many values of the same type and it becomes easy to mix them up. If you have a record type defined
that looks exactly the same as the return type of the SQL statement, you
can therefore add a <code>record_out</code> statement, which ppx_rapper will handle and 
automatically return a record that matches its structure instead of a tuple:</p>
<div class="sourceCode" id="cb2"><pre
class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> my_query_result = </span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> { id : <span class="dt">int</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> ; username : <span class="dt">string</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> ; following : <span class="dt">bool</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> ; bio : <span class="dt">string</span> <span class="dt">option</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> my_query =</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>  [%rapper</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>    get_opt</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>      {sql|</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>      SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>      FROM users</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>      WHERE username &lt;&gt; %<span class="dt">string</span>{wrong_user} AND id &gt; %<span class="dt">int</span>{min_id}</span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>      |sql} record_out] <span class="co">(* notice record out *)</span></span></code></pre></div>
<p>Now <code>my_query</code> will return a typed record
<code>my_query_result</code> instead of a tuple. Records are easier to work with; they
help you avoid mixing up the values and work as self-documentation.</p>
<p>Of course, writing these types out by hand quickly becomes tedious, so yesterday I conjured up an Awk script
that can be used to generate the types for you.</p>
<div class="sourceCode" id="cb3"><pre
class="sourceCode awk"><code class="sourceCode awk"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co">#!/usr/bin/env awk -f</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="cf">BEGIN</span> <span class="op">{</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>  inside_rapper <span class="op">=</span> <span class="dv">0</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>  statement_name <span class="op">=</span> <span class="st">&quot;&quot;</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>  add_bracket <span class="op">=</span> <span class="dv">0</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Extract function name so that we can use that</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="co"># for the type name</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a><span class="ot">/</span><span class="ss">let </span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">zA</span><span class="ot">-</span><span class="ss">Z0</span><span class="ot">-</span><span class="ss">9_</span><span class="ot">]+</span><span class="ss"> =</span><span class="ot">/</span> <span class="op">&amp;&amp;</span> inside_rapper <span class="op">==</span> <span class="dv">0</span> <span class="op">{</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a>  statement_name <span class="op">=</span> <span class="dt">$2</span></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a><span class="co"># If [%rapper, begin extraction</span></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a><span class="ot">/\</span><span class="sc">[</span><span class="ss">%rapper</span><span class="ot">/</span> <span class="op">&amp;&amp;</span> inside_rapper <span class="op">==</span> <span class="dv">0</span> <span class="op">{</span></span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a>  inside_rapper <span class="op">=</span> <span class="dv">1</span></span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a>  add_bracket <span class="op">=</span> <span class="dv">1</span></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a>  <span class="kw">printf</span> <span class="st">&quot;type &quot;</span> statement_name <span class="st">&quot;_result = </span><span class="sc">\n</span><span class="st">&quot;</span></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="co"># Match patterns like @int{string}</span></span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a>inside_rapper <span class="op">==</span> <span class="dv">1</span> <span class="op">&amp;&amp;</span> <span class="op">/@</span>[a<span class="op">-</span>z<span class="op">?</span>]<span class="op">+{</span>[a<span class="op">-</span>z_0<span class="op">-</span><span class="dv">9</span>]<span class="op">+}/</span>g <span class="op">{</span></span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a>    from <span class="op">=</span> <span class="dv">0</span></span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a>    <span class="co"># Remove everything before &quot;@&quot; and after &quot;}&quot;</span></span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a>    pos <span class="op">=</span> <span class="fu">match</span> <span class="op">(</span><span class="dt">$0</span><span class="op">,</span> <span class="ot">/</span><span class="ss">@</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z?</span><span class="ot">]+</span><span class="ss">{</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z_0</span><span class="ot">-</span><span class="ss">9</span><span class="ot">]+</span><span class="ss">}</span><span class="ot">/</span><span class="op">,</span> val<span class="op">)</span></span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a>    <span class="cf">while</span> <span class="op">(</span><span class="dv">0</span> <span class="op">&lt;</span> pos<span class="op">)</span> <span class="op">{</span></span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a>        <span class="co"># Split the line based on curly braces</span></span>
<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a>        <span class="fu">split</span><span class="op">(</span>val[<span class="dv">0</span>]<span class="op">,</span> parts<span class="op">,</span> <span class="ot">/</span><span class="ss">{</span><span class="ot">/</span><span class="op">)</span></span>
<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a>        <span class="co"># Extract @type, located in parts[1]</span></span>
<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a>        <span class="co"># Use substr to remove the @</span></span>
<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a>        type <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>parts[<span class="dv">1</span>]<span class="op">,</span> <span class="dv">2</span><span class="op">)</span></span>
<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a>        <span class="co"># Remove the last character (}) of key</span></span>
<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a>        key <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>parts[<span class="dv">2</span>]<span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="fu">length</span><span class="op">(</span>parts[<span class="dv">2</span>]<span class="op">)-</span><span class="dv">1</span><span class="op">)</span></span>
<span id="cb3-38"><a href="#cb3-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-39"><a href="#cb3-39" aria-hidden="true" tabindex="-1"></a>        <span class="co"># ? = optional, so convert that to option</span></span>
<span id="cb3-40"><a href="#cb3-40" aria-hidden="true" tabindex="-1"></a>        <span class="cf">if</span> <span class="op">(</span><span class="fu">substr</span><span class="op">(</span>type<span class="op">,</span> <span class="fu">length</span><span class="op">(</span>type<span class="op">))</span> <span class="op">==</span> <span class="st">&quot;?&quot;</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb3-41"><a href="#cb3-41" aria-hidden="true" tabindex="-1"></a>            type <span class="op">=</span> <span class="fu">substr</span><span class="op">(</span>type<span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="fu">length</span><span class="op">(</span>type<span class="op">)</span> <span class="op">-</span> <span class="dv">1</span><span class="op">)</span> <span class="st">&quot; option&quot;</span></span>
<span id="cb3-42"><a href="#cb3-42" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span></span>
<span id="cb3-43"><a href="#cb3-43" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-44"><a href="#cb3-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-45"><a href="#cb3-45" aria-hidden="true" tabindex="-1"></a>        <span class="cf">if</span> <span class="op">(</span>add_bracket <span class="op">==</span> <span class="dv">1</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a>          <span class="kw">print</span> <span class="st">&quot; { &quot;</span> key <span class="st">&quot; : &quot;</span> type</span>
<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a>          add_bracket <span class="op">=</span> <span class="dv">0</span></span>
<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span></span>
<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a>        <span class="cf">else</span> <span class="op">{</span></span>
<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a>          <span class="co"># Print the key</span></span>
<span id="cb3-51"><a href="#cb3-51" aria-hidden="true" tabindex="-1"></a>          <span class="kw">print</span> <span class="st">&quot; ; &quot;</span> key <span class="st">&quot; : &quot;</span> type</span>
<span id="cb3-52"><a href="#cb3-52" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span></span>
<span id="cb3-53"><a href="#cb3-53" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-54"><a href="#cb3-54" aria-hidden="true" tabindex="-1"></a>        <span class="co"># In case there is more than 1 key per line, make sure we get all of them.</span></span>
<span id="cb3-55"><a href="#cb3-55" aria-hidden="true" tabindex="-1"></a>        <span class="co"># Awk matches on the first occurance though, so we need to remove the</span></span>
<span id="cb3-56"><a href="#cb3-56" aria-hidden="true" tabindex="-1"></a>        <span class="co"># previous match from the string and run again.</span></span>
<span id="cb3-57"><a href="#cb3-57" aria-hidden="true" tabindex="-1"></a>        from <span class="op">+=</span> pos <span class="op">+</span> val[<span class="dv">0</span><span class="op">,</span> <span class="st">&quot;length&quot;</span>]</span>
<span id="cb3-58"><a href="#cb3-58" aria-hidden="true" tabindex="-1"></a>        pos <span class="op">=</span> <span class="fu">match</span><span class="op">(</span> <span class="fu">substr</span><span class="op">(</span> <span class="dt">$0</span><span class="op">,</span> from <span class="op">),</span> <span class="ot">/</span><span class="ss">@</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z?</span><span class="ot">]+</span><span class="ss">{</span><span class="ot">[</span><span class="ss">a</span><span class="ot">-</span><span class="ss">z_0</span><span class="ot">-</span><span class="ss">9</span><span class="ot">]+</span><span class="ss">}</span><span class="ot">/</span><span class="op">,</span> val <span class="op">)</span></span>
<span id="cb3-59"><a href="#cb3-59" aria-hidden="true" tabindex="-1"></a>    <span class="op">}</span></span>
<span id="cb3-60"><a href="#cb3-60" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb3-61"><a href="#cb3-61" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-62"><a href="#cb3-62" aria-hidden="true" tabindex="-1"></a><span class="co"># If we see |sql], that means we are done. </span></span>
<span id="cb3-63"><a href="#cb3-63" aria-hidden="true" tabindex="-1"></a><span class="ot">/\</span><span class="sc">|</span><span class="ss">sql}</span><span class="ot">/</span> <span class="op">&amp;&amp;</span> inside_rapper <span class="op">==</span> <span class="dv">1</span> <span class="op">{</span></span>
<span id="cb3-64"><a href="#cb3-64" aria-hidden="true" tabindex="-1"></a>   <span class="kw">printf</span> <span class="st">&quot; }</span><span class="sc">\n</span><span class="st">&quot;</span></span>
<span id="cb3-65"><a href="#cb3-65" aria-hidden="true" tabindex="-1"></a>   inside_rapper <span class="op">=</span> <span class="dv">0</span></span>
<span id="cb3-66"><a href="#cb3-66" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<p>The <code>my_query_result</code> type you see above is generated with this script. I will not go into detail how the script work, but I have added comments explaining the steps. I chose Awk because it is specifically built for simple line
manipulation, a perfect fit for this problem.</p>
<p>You can run the script above with</p>
<pre><code>chmod +x my-awk-script
./my-awk-script file_with_my_query.ml</code></pre>
<p>I used the <a href="https://ferd.ca/awk-in-20-minutes.html">Awk in 20
Minutes tutorial</a> to learn how to write Awk scripts, and levaraged
that to write the code above. The syntax feels foreign at first but the language is super easy to use.</p>
<h2 id="using-it-in-kakoune">Using it in Kakoune</h2>
<p>Already we have a CLI tool, but ideally I would like to use it within
my editor. This is where Kakoune makes it so easy.</p>
<p>Kakoune allows you to run any POSIX script, pipe the selected input,
and have the output printed in the editor. So I do not even have to
write a plugin to use it with the editor (!).</p>
<p>Allow me to demonstrate: <a href="https://asciinema.org/a/644993">https://asciinema.org/a/644993</a>.</p>
<p>In this video, I show how, without even creating any plugin, I could
use my script in my editing flow to generate the OCaml types. All I had
to do was:</p>
<ol type="1">
<li>Select the let statement</li>
<li>Press &lt;!&gt;</li>
<li>Run the awk script above, which I have named
ocaml-rapper-helper.</li>
</ol>
<p>You can easily levarage this with many other tools. For example, you
can use <a href="https://github.com/verdverm/chatgpt">Chatgpt CLI</a> or <a href="https://ollama.com/">Ollama</a> to
have an LLM autocomplete or explain code for you, similar to codepilot but free.</p>
<p>I hope this demonstrates why I am so in love with this editor. It
comes with a bunch of sane standards OOB, a great LSP integration, but
it is also so easy to extend and add your own functionality. Even if it
does not have many plugins on its own due to a smaller community, it
interops so well with other UNIX tools that it makes up for it. Had I
wanted to write some more complex text manipulation, I could have used a language
like Rust, Go or OCaml that compiles down to a single binary and use
that instead.</p>