A short introduction to functional programming in Julia

Julia
functional programming
Advent of Code
How to structure the code using nothing but circles and pipes.
Author

Szymon Bęczkowski

Published

December 12, 2023

Advent of code is a quaint site that posts one programming problem each day just before Christmas. Just so we can flex our programming muscle. More importanetly, we can eavsdrop on other people’s code. Looking at some of the solutions motivated me to write this tutorial.

A lot of people start learning programming with C++ or MATLAB. This inevitably leads to a very verbose and error-prone code. In this tutorial, I aim to illustrate how modern programming practices and a good language can lead to elegant solutions and maintainable code.

The problem

Advent of code day one problem. The problem boils down to: given a string, extract first and last digits in each line; join the two digits to form a two-digit number and add all resulting numbers.

calibration = """
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
"""

Interlude I: splitting hair lines

Text manipulation is a fundamental challenge in programming, and nearly every programming language provides built-in primitives for this purpose.

Julia’s function eachline is a conveninent tool for splitting a string into individual lines. However, attempting to pass our calibration variable into eachline we will result in an error. Fortunately, the docs provide guidance on this issue:

To iterate over each line of a String, eachline(IOBuffer(str)) can be used.

for cal_line in eachline(IOBuffer(calibration))
    @show cal_line
end
cal_line = "1abc2"
cal_line = "pqr3stu8vwx"
cal_line = "a1b2c3d4e5f"
cal_line = "treb7uchet"

Now that we understand how to split the calibration string into individual lines, the next step is to “simpy” extract the two-digit numbers from each line. Adding them is straightforward.

The actual problem

Looking within individual line, our problem can now be states as follows: given a string, extract first and last digits, and then join the two digits to form a two-digit number.

Let’s take one of the lines above and use it as a test case.

cal_line = "a1b2c3d4e5f"

Here’s a solution that is quite typical for students after completing their first MATLAB course:

number = ""
@show number

for i in 1:length(cal_line)
    s = cal_line[i]
    if isdigit(s)
        number = number * s
        break
    end
end

@show number

for i in length(cal_line):-1:1
    s = cal_line[i]
    if isdigit(s)
        number = number * s
        break
    end
end

@show number

parse(Int, number)
number = ""
number = "1"
number = "15"
15

It works but is also suuuuper long. If you are using an editor, like VSCode, you might notice some code underlined with a squiggly line. This is Julia’s linter telling us that:

Indexing with indices obtained from length, size etc is discouraged. Use eachindex or axes instead.

A linter is a tool that continously checks your code for possible errors and anti-patterns. In this case, it points out that for i in 1:length(cal_line) is not cool. Let’s implement the proposed fix.

Rather than creating a vector of indexes 1:length(cal_line), we can directly obtain using eachindex(cal_line) for the first case and Iterators.reverse(eachindex(cal_line)) for the second case. This approach is advisable because arrays in Julia can have arbitrary indexes. Forget about the war between 0-based and 1-based indexing styles. Embrace Star Wars movies order indexing.

Notice that we did not use the i variable for anything other than indexing an element of the input string. We should take advantage of the fact that we can just iterate over anything that is iterable. Any String or Array or similar data structure is. If a data structure can be described similarly to: a string is a collection of characters or an array is a collection of numbers then it most likely is iterable.

number = ""
@show number

for s in cal_line
    if isdigit(s)
        number = number * s
        break
    end
end

@show number

for s in reverse(cal_line)
    if isdigit(s)
        number = number * s
        break
    end
end

@show number

parse(Int, number)
number = ""
number = "1"
number = "15"
15

The code is now a bit prettier but it still appears repetitive and somewhat verbose. The first loop seeks the first numeric character, while the second loop seeks the last numeric character. In essence, they perform nearly identical tasks. They iterate over a collection of characters until they encounter a digit.

What if we first remove all the letters (non-digits) from the string? Then our search for digits will be much simpler. So, how do we go from "a1b2c3d4e5f" to "12345"?

Interlude II: map filter reduce

The three functions–map, filter and reduce–are the workhorses of functional programming.

  • map applies a function element-wise to a collection
  • filter yields a sub-collection based on specified criteria
  • reduce accumulates an output based on subsequent elements

These three functions are powerful tools in functional programming paradigm and can be employed for a wide range of tasks.

function clip(x, x_min=-0.5, x_max=0.5)
    if x > x_max return x_max end
    if x < x_min return x_min end
    return x
end

@show clip(-1)
@show clip(0)
@show clip(0.7)

x = -π:0.01:π
y = sin.(x)
y_clipped = map(clip, y) # map does the heavy lifting here

using Plots
plot(x, y, label="sin(x)", xticks=(-π:π:π,["-π","0","π"]))
plot!(x, y_clipped, label="clipped sin(x)")
clip(-1) = -0.5
clip(0) = 0
clip(0.7) = 0.5

Let the data flow

First lets use the newly acquired powers to get rid of the letters in our line. Any function, that returns a Boolean, can be used inside a filter.

cal_digits = filter(isdigit, cal_line)
"12345"

Notice we do don’t explicitly provide any input to the isdigit function, instead filter does it for us. It applies the isdigit function to every element of the collection, and if the result is True, the corresponding element gets to live a bit longer in the new, filtered collection.

Now, we can proceed to extract the first and last digits from the refined string.

first_cal_digit = first(cal_digits)
last_cal_digit = last(cal_digits)
cal_number = first_cal_digit * last_cal_digit
parse(Int, cal_number)
15

We’ve achieved a pretty concise code. More importanetly, it is self-documenting as variable names clearly convey our intentions. I, however, don’t like it. first_cal_digit variable name duplicates what first(cal_digits) communicates. Furthermore, its lifespan is pretty short so we don’t really need it. These short-lived variables can be ommited by using nested function calls:

first(filter(isdigit, cal_line))
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)

However, nesting has a nasty habit of creating a sea of parentheses

make(something(important(with(this(data)))))

This was typically remedied by tools like Rainbow Brackets. However, there are more elegant solutions.

One such approach is function composition. Picture glueing the output of one function to the input of another. This enables us to construct new function without the need for intermediate variables.

function_chain = make  something  important  with  this
function_chain(data)

The operator composes (glues) functions together. No intermediate variables necessary.

Similar solution can also be achieved by piping the data with a |> (pipe) operator.

data |> this |> with |> important |> something |> make

Notice how the flow of data changes direction. The input data, on the left, is piped through successive functions. It’s important to emphasize that the execution order remains unchanged. With composed functions, the last function in the chain would operate on the data first.

Some languages allow for piping <| both ways |>. Julia does not. This can lead to some confusing code if |> and are used together.

π |> round  sin
0.0

π is first applied to sin and the result is then rounded.

We can use piping to find the first digit in our filtered string:

filter(isdigit, cal_line) |> first
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)

As you can see, piping is a neat way of pushing data through a chain of operations.

output = data |> massage |> this |> into |> something |> different

flowchart LR
  Input(data):::data --> A[massage] --> B[this] --> C[into] --> D[something] --> E[different] --> Result(output):::data
  classDef data fill:#C5F4E0

There is, however, a small problem. These functions take one input and return one output. Yet, our task involves extracting two things: the first and last digits. Subsequently, we need to sum these two numbers.

flowchart LR
  Input(cal_line):::data --> Filter[filter numbers]
  Filter --> First[get first number] & Last[get last number] --> Join[join] --> Result(cal_number):::data
  classDef data fill:#C5F4E0

We can massage our flow diagram, using Julia’s broadcasting mechanism, until we get a nice straight diagram:

flowchart LR
  Input(cal_line):::data --> Filter[filter numbers]
  Filter -.-> Numbers[[get first and last numbers]] -.-> Join[join] --> Result(cal_number):::data
  classDef data fill:#C5F4E0

Massaging the chain

filter(isdigit, cal_line) .|> (first, last)
('1', '5')

Two crucial things happen here. First, (first, last) is a tuple (collection) of functions.

Julia offers many ways of grouping things into a some form of collection: tuples, named tuples, arrays, dictionaries, structs, …. You can thing of tuples as a versatile container that holds multiple things without much boilerplate. They prove especially usefull when joining things for a moment, just as we need here.

The dot represents a broadcast operator. .|> is a way of piping the data into multiple things at once, broadcasting data into individual inputs.

It functions similarly as cos.([π,π]), where two pies are shoved into a cosine even though the cosine function can only handle one pie at a time.

Now we do have our first and last digits captured, but they still are in a character form. We need to convert them to numbers. As you may observe, the output of the previous operation is ('1', '5') This is a tuple of characters. A join function can create a string out of that.

filter(isdigit, cal_line) .|> (first, last) |> join
"15"

The resulting string should be reinterpreted as a number. The parse function can do that for us but it requires two inputs: what to parse and how to parse it. To incorporate it into our streamlined code flow, we must embed one of the inputs.

parseint(number) = parse(Int, number) # parseint() function has one input and one output
filter(isdigit, cal_line) .|> (first, last) |> join |> parseint
15

Interlude III: Function-That-Must-Not-Be-Named

As with variables, we can get rid of naming this short lived function by using an anonymous function. An anonymous function is a function without a name. Consider this function:

double(x) = 2x
double (generic function with 1 method)

It doubles whatever we feed to it. Now, let’s create an anonymous version of this function:

x -> 2x
#13 (generic function with 1 method)

Julia made it for us. It is function #13. Where is it then? How do we call it?

Function #13 is indeed quite unlucky. We did not catch it in time1 and it went to the void. Confusingly, we can name an anonymous function.

1 A Pokémon function?

twofold = x -> 2x
twofold(5)
10

Most common use of these anonymous functions is existence in a chain of functions or being passed to another function.

map(x->2x, [1,2,3]) # x->2x is an anonymous function that is passed to a map function
3-element Vector{Int64}:
 2
 4
 6

We will leverage this technique to parse a string into an integer. Now, the line parsing is complete.

filter(isdigit, cal_line) .|> (first, last) |> join |> n -> parse(Int, n)
15

Answer to the problem

We can now assemble all the pieces to solve the Advent of Code day 1 problem. We take the line parser we just wrote and wrap it in a function so we can parse any line we throw at it. Then, for each line in our calibration string, extract a calibration value. The sum of all calibration values is our answer.

cal_value(line) = filter(isdigit, line) .|> (first, last) |> join |> n -> parse(Int, n)
IOBuffer(calibration) |> eachline .|> cal_value |> sum
142

The answer to the problem is 142.

Takeaways

Try to describe the solution algorithm in a sentence. Don’t focus on implementation just yet. Each verb in your description is a good candidate for a function. Each function can be thought as a sub-problem.

Breaking down large problems into smaller, well-defined sub-problems not only makes makes the code more manageable but also enhances readability and maintainability. Strive to solve these isolated problems as pure functions. This kind of code organisation also works really well with test-driven development (TDD).

The smaller the problem the less code you need to write to solve it. Long and complex code can be challenging to navigate and reason about. Try keeping related code on one monitor screen2 for better comprehension, easier debugging and more efficient development.

2 Rotating the monitor helps.

Avoid iterating over data with a for loop unless you really have to. It is a short way to off-by-one and out-of-bounds errors. You will also be tempted to change variables inside the for loop and to create loops inside loops. Good luck debugging that.