From fbdf1e613f90ad86419364f768e0d8074bf250bc Mon Sep 17 00:00:00 2001 From: Marc Coquand Date: Sun, 3 Mar 2024 15:10:17 -0600 Subject: Add new post: Using Awk with Kakoune --- posts/awk-with-kakoune.njk | 151 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 posts/awk-with-kakoune.njk (limited to 'posts') diff --git a/posts/awk-with-kakoune.njk b/posts/awk-with-kakoune.njk new file mode 100644 index 0000000..ccf4b1a --- /dev/null +++ b/posts/awk-with-kakoune.njk @@ -0,0 +1,151 @@ +--- +layout: post.njk +title: Using Awk to Enable Kakoune to Generate Types for OCaml +tags: post +date: 2024-03-03 +--- +

Lately I have been using the Kakoune text editor for my editing. What +absolutely blows my mind is how interoperable it is with Unix scripts +and how easy it is to create extensions.

+

I am working on a Web server in OCaml. In my work, I levarage a +library called ppx_rapper, which +allows you to write SQL queries which are then converted to OCaml types. So a select query looks something like +this:

+
let my_query =
+  [%rapper
+    get_opt
+      {sql|
+      SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}
+      FROM users
+      WHERE username <> %string{wrong_user} AND id > %int{min_id}
+      |sql}]
+

my_query becomes a function that takes an +int and sql_connection and returns an n-tuple, +(int,string,bool,string). This can be hard to work with, especially a larger query can return many values of the same type and it becomes easy to mix them up. If you have a record type defined +that looks exactly the same as the return type of the SQL statement, you +can therefore add a record_out statement, which ppx_rapper will handle and +automatically return a record that matches its structure instead of a tuple:

+
type my_query_result = 
+ { id : int
+ ; username : string
+ ; following : bool
+ ; bio : string option
+ }
+let my_query =
+  [%rapper
+    get_opt
+      {sql|
+      SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}
+      FROM users
+      WHERE username <> %string{wrong_user} AND id > %int{min_id}
+      |sql} record_out] (* notice record out *)
+

Now my_query will return a typed record +my_query_result instead of a tuple. Records are easier to work with; they +help you avoid mixing up the values and work as self-documentation.

+

Of course, writing these types out by hand quickly becomes tedious, so yesterday I conjured up an Awk script +that can be used to generate the types for you.

+
#!/usr/bin/env awk -f
+
+BEGIN {
+  inside_rapper = 0
+  statement_name = ""
+  add_bracket = 0
+}
+
+# Extract function name so that we can use that
+# for the type name
+/let [a-zA-Z0-9_]+ =/ && inside_rapper == 0 {
+  statement_name = $2
+}
+
+# If [%rapper, begin extraction
+/\[%rapper/ && inside_rapper == 0 {
+  inside_rapper = 1
+  add_bracket = 1
+  printf "type " statement_name "_result = \n"
+}
+
+# Match patterns like @int{string}
+inside_rapper == 1 && /@[a-z?]+{[a-z_0-9]+}/g {
+    from = 0
+    # Remove everything before "@" and after "}"
+    pos = match ($0, /@[a-z?]+{[a-z_0-9]+}/, val)
+    while (0 < pos) {
+
+        # Split the line based on curly braces
+        split(val[0], parts, /{/)
+
+        # Extract @type, located in parts[1]
+        # Use substr to remove the @
+        type = substr(parts[1], 2)
+
+        # Remove the last character (}) of key
+        key = substr(parts[2], 1, length(parts[2])-1)
+
+        # ? = optional, so convert that to option
+        if (substr(type, length(type)) == "?") {
+            type = substr(type, 1, length(type) - 1) " option"
+        }
+
+
+        if (add_bracket == 1) {
+          print " { " key " : " type
+          add_bracket = 0
+        }
+        else {
+          # Print the key
+          print " ; " key " : " type
+        }
+
+        # In case there is more than 1 key per line, make sure we get all of them.
+        # Awk matches on the first occurance though, so we need to remove the
+        # previous match from the string and run again.
+        from += pos + val[0, "length"]
+        pos = match( substr( $0, from ), /@[a-z?]+{[a-z_0-9]+}/, val )
+    }
+}
+
+# If we see |sql], that means we are done. 
+/\|sql}/ && inside_rapper == 1 {
+   printf " }\n"
+   inside_rapper = 0
+ }
+

The my_query_result type you see above is generated with this script. I will not go into detail how the script work, but I have added comments explaining the steps. I chose Awk because it is specifically built for simple line +manipulation, a perfect fit for this problem.

+

You can run the script above with

+
chmod +x my-awk-script
+./my-awk-script file_with_my_query.ml
+

I used the Awk in 20 +Minutes tutorial to learn how to write Awk scripts, and levaraged +that to write the code above. The syntax feels foreign at first but the language is super easy to use.

+

Using it in Kakoune

+

Already we have a CLI tool, but ideally I would like to use it within +my editor. This is where Kakoune makes it so easy.

+

Kakoune allows you to run any POSIX script, pipe the selected input, +and have the output printed in the editor. So I do not even have to +write a plugin to use it with the editor (!).

+

Allow me to demonstrate: https://asciinema.org/a/644993.

+

In this video, I show how, without even creating any plugin, I could +use my script in my editing flow to generate the OCaml types. All I had +to do was:

+
    +
  1. Select the let statement
  2. +
  3. Press <!>
  4. +
  5. Run the awk script above, which I have named +ocaml-rapper-helper.
  6. +
+

You can easily levarage this with many other tools. For example, you +can use Chatgpt CLI or Ollama to +have an LLM autocomplete or explain code for you, similar to codepilot but free.

+

I hope this demonstrates why I am so in love with this editor. It +comes with a bunch of sane standards OOB, a great LSP integration, but +it is also so easy to extend and add your own functionality. Even if it +does not have many plugins on its own due to a smaller community, it +interops so well with other UNIX tools that it makes up for it. Had I +wanted to write some more complex text manipulation, I could have used a language +like Rust, Go or OCaml that compiles down to a single binary and use +that instead.

-- cgit v1.2.3