Finding all key sequence paths in a nested Clojure map

In Clojure when you need to work with maps you get the option to choose from a number of builtin core functions such as get, assoc, dissoc, merge, select-keys, zipmap, keys, vals and many others.

Getting the value associated with a given key is as easy as doing.

user=> (get {:a 1 :b 2 :c 3} :c)
3

Or if you need to have a default value when a given key does not exists you can do this instead.

user=> (get {:a 1 :b 2 :c 3} :not-found 42)
42

Associating a key/value pair or removing a key from a map can be done using the assoc and dissoc functions.

user=> (assoc {:a 1 :b 2} :c 3)
{:a 1, :b 2, :c 3}
user=> (dissoc {:a 1 :b 2 :c 3} :c)
{:a 1, :b 2}

The get, assoc and dissoc functions work only with top-level map keys, so if you need to retrieve a value from a nested map you need to use the get-in function instead.

So for example if we need to retrieve the innermost key’s value we can do it like this.

user=> (get-in {:a {:b {:c 1}}} [:a :b :c])
1

In order to associate a key/value pair in a nested map you would use the assoc-in function.

Another useful function is the keys function which returns a sequence of the keys in a given map.

user=> (keys {:a 1 :b 2 :c 3})
(:a :b :c)

But notice what happens if we use keys with a nested map instead.

user=> (keys {:a {:b {:c 1}}})
(:a)

When we apply keys function to a nested a map we only get the top-level keys of the map and not all nested keys as well.

Your first guess here would probably be that there is a keys-in function similar to how we have the get-in, assoc-in and update-in functions which work with nested maps. Unfortunately these is no builtin function like that.

However, we can build one ourselves. First, we will create a sample nested map that we can work with.

user=> (def m {:a 1 :b 2 :c 3 :d {:e 4 :f {:g {:h 5}}}})
#'user/m

Given the above example nested map we want to create a function which will return all key sequence paths. For the above example map the key sequences that we have are.

([:a] [:b] [:c] [:d] [:d :e] [:d :f] [:d :f :g] [:d :f :g :h])

This task can be solved in a number of ways. Possible solutions can be achieved using DFS or BFS traversal, another solution would be to use a zipper.

In this post we will see how to create a function, which returns all key sequence paths using depth-first search traversal.

In order to do that we will use the builtin tree-seq function, which returns a lazy sequence of the nodes in a tree via a depth-first search traversal.

user=> (doc tree-seq)
-------------------------
clojure.core/tree-seq
([branch? children root])
  Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
   branch? must be a fn of one arg that returns true if passed a node
   that can have children (but may not).  children must be a fn of one
   arg that returns a sequence of the children. Will only be called on
   nodes for which branch? returns true. Root is the root node of the
  tree.
nil

Now, let us create our own implementation of the keys-in function.

(defn keys-in
  "Returns a sequence of all key paths in a given map using DFS walk."
  [m]
  (letfn [(children [node]
            (let [v (get-in m node)]
              (if (map? v)
                (map (fn [x] (conj node x)) (keys v))
                [])))
          (branch? [node] (-> (children node) seq boolean))]
    (->> (keys m)
         (map vector)
         (mapcat #(tree-seq branch? children %)))))

The keys-in function works in the following way. We have defined two functions in a letfn binding form.

The children function gets the value of the key we are currently exploring and if it is a map it returns a sequence of the inner map’s keys by prepending the path of currently explored node. This gives us the full path to the keys of the nested map.

The branch? function simply uses children to return either true or false depending on whether the node has any children or not.

Finally we get all top-level keys of the map, then transform each key to a vector, so that we represent them using their full path and we apply each path to the tree-seq function. The results of traversing each top-level key are concatenated and returned.

And this is what we get when we use our keys-in function with the nested map we have created previously.

user=> (keys-in m)
([:a] [:b] [:c] [:d] [:d :e] [:d :f] [:d :f :g] [:d :f :g :h])

We can see that the result contains not just top-level key sequence paths, but also it includes the paths of any nested maps as well.

Written on June 9, 2018