--- layout: post.njk title: Using Awk with Kakoune to Generate SQL Types for OCaml tags: post date: 2024-03-03 ---

Lately I have been using the Kakoune text editor for my editing. What blows my mind is how interoperable the editor is with Unix scripts and how easy it is to create extensions.

So I am working on a Web server in OCaml. In my work, I levarage a library called ppx_rapper, which allows you to write SQL queries which are then converted to OCaml types. So a select query looks something like this:

let my_query =
  [%rapper
    get_opt
      {sql|
      SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}
      FROM users
      WHERE username <> %string{wrong_user} AND id > %int{min_id}
      |sql}]

my_query becomes a function that takes an int and sql_connection and returns an n-tuple, (int,string,bool,string). This can be hard to work with, especially a larger query can return many values of the same type and it becomes easy to mix them up. If you have a record type defined that looks exactly the same as the return type of the SQL statement, you can therefore add a record_out statement, which ppx_rapper will handle and automatically return a record that matches its structure instead of a tuple:

type my_query_result = 
 { id : int
 ; username : string
 ; following : bool
 ; bio : string option
 }
let my_query =
  [%rapper
    get_opt
      {sql|
      SELECT @int{id}, @string{username}, @bool{following}, @string?{bio}
      FROM users
      WHERE username <> %string{wrong_user} AND id > %int{min_id}
      |sql} record_out] (* notice record out *)

Now my_query will return a typed record my_query_result instead of a tuple. Records are easier to work with; they help you avoid mixing up the values and work as self-documentation.

Of course, writing these types out by hand quickly becomes tedious, so yesterday I conjured up an Awk script that can be used to generate the types for you.

#!/usr/bin/env awk -f

BEGIN {
  inside_rapper = 0
  statement_name = ""
  add_bracket = 0
}

# Extract function name so that we can use that
# for the type name
/let [a-zA-Z0-9_]+ =/ && inside_rapper == 0 {
  statement_name = $2
}

# If [%rapper, begin extraction
/\[%rapper/ && inside_rapper == 0 {
  inside_rapper = 1
  add_bracket = 1
  printf "type " statement_name "_result = \n"
}

# Match patterns like @int{string}
inside_rapper == 1 && /@[a-z?]+{[a-z_0-9]+}/g {
    from = 0
    # Remove everything before "@" and after "}"
    pos = match ($0, /@[a-z?]+{[a-z_0-9]+}/, val)
    while (0 < pos) {

        # Split the line based on curly braces
        split(val[0], parts, /{/)

        # Extract @type, located in parts[1]
        # Use substr to remove the @
        type = substr(parts[1], 2)

        # Remove the last character (}) of key
        key = substr(parts[2], 1, length(parts[2])-1)

        # ? = optional, so convert that to option
        if (substr(type, length(type)) == "?") {
            type = substr(type, 1, length(type) - 1) " option"
        }


        if (add_bracket == 1) {
          print " { " key " : " type
          add_bracket = 0
        }
        else {
          # Print the key
          print " ; " key " : " type
        }

        # In case there is more than 1 key per line, make sure we get all of them.
        # Awk matches on the first occurance though, so we need to remove the
        # previous match from the string and run again.
        from += pos + val[0, "length"]
        pos = match( substr( $0, from ), /@[a-z?]+{[a-z_0-9]+}/, val )
    }
}

# If we see |sql], that means we are done. 
/\|sql}/ && inside_rapper == 1 {
   printf " }\n"
   inside_rapper = 0
 }

The my_query_result type you see above is generated with this script. I will not go into detail how the script work, but I have added comments explaining the steps. I chose Awk because it is specifically built for simple line manipulation, a perfect fit for this problem.

You can run the script above with

chmod +x my-awk-script
./my-awk-script file_with_my_query.ml

I used the Awk in 20 Minutes tutorial to learn how to write Awk scripts, and levaraged that to write the code above. The syntax feels foreign at first but the language is super easy to use.

Using it in Kakoune

Already we have a CLI tool, but ideally I would like to use it within my editor. This is where Kakoune makes it so easy.

Kakoune allows you to run any POSIX script, pipe the selected input, and have the output printed in the editor. So I do not even have to write a plugin to use it with the editor (!).

Allow me to demonstrate: https://asciinema.org/a/644993.

In this video, I show how, without even creating any plugin, I could use my script in my editing flow to generate the OCaml types. All I had to do was:

  1. Select the let statement
  2. Press <!>
  3. Run the awk script above, which I have named ocaml-rapper-helper.

You can easily levarage this with many other tools. For example, you can use Chatgpt CLI or Ollama to have an LLM autocomplete or explain code for you, similar to codepilot but free.

I hope this demonstrates why I am so in love with this editor. It comes with a bunch of sane standards OOB, a great LSP integration, but it is also so easy to extend and add your own functionality. Even if it does not have many plugins on its own due to a smaller community, it interops so well with other UNIX tools that it makes up for it. Had I wanted to write some more complex text manipulation, I could have used a language like Rust, Go or OCaml that compiles down to a single binary and use that instead.