Ruby/Crystal/Rust Advent of Code 2022 - Day 7

Ruby/Crystal/Rust Advent of Code 2022 - Day 7

Day 7 finds us still working with that pesky communication device. For today's task, though, all that we have to do is to analyze its filesystem. No worries, right? Let's see! Can pattern matching come to the rescue?

The Repository

All of the code for this day's solutions, as well as the data input file, can be found at:

https://github.com/wyhaines/advent-of-code-2022/tree/main/day_7

Part 1

To see the full details of the challenge, visit the Advent of Code 2022 - Day 7 page. However, the tl;dr version of the challenge is as follows:

You find yourself on an expedition, surrounded by the sounds of nature like birds chirping and raindrops hitting leaves. But as you proceed, you run into a problem with the device that the helpful elves gave you. Not only is the communication system giving you trouble, but when you try to run a system update, you get an error message: "No space left on device." Looks like you'll have to clear up some space before you can proceed.

That's where browsing around the filesystem comes in. You navigate through the directories, moving in and out, and listing the contents of the directories as you go. The filesystem is structured as a tree of files and directories, with the outermost directory being /. You take note of the commands you executed, such as "cd" (change directory) and "ls" (list), as well as the output from the terminal.

For example, when you enter "cd /", you switch the current directory to the outermost directory. "cd x" moves in one level, looking for the directory named x and making it the current directory. And "cd .." moves out one level, finding the directory that contains the current directory and making that the new current directory. As for "ls", it prints out all the files and directories within the current directory. So, "123 abc" means that the current directory contains a file named "abc" with a size of 123, while "dir xyz" means that the current directory contains a directory named "xyz".

With all that in mind, your task is to find all the directories with a total size of at most 100000 and then calculate the sum of their total sizes. You can use the example provided to give you an idea of what the finished result should look like. The directories "a" and "e" in the example fit this criterion with total sizes of 94853 and 584 respectively, resulting in a sum of 95437. So, take a deep breath and start browsing that filesystem, looking for those directories that meet the criteria and then summing up their total sizes. And good luck!

So, for example, consider the following data:

$ cd /
$ ls
dir fts
16394 tllvcdr.sjl
195891 zbdp.gqb
dir zddrb
$ cd fts
$ ls
dir dlqtffw
dir rbfmmjvd
254713 wvwhrb.dhh
$ cd dlqtffw
$ ls
73533 nqbvg.fgd
$ cd ..

The task is to follow the commands as they move through the filesystem, and calculate the size of each directory as this is done.

  • Starting from /, there are two files with sizes of 16394 and 195891.

  • Descending into /fts, there is one file with a size of 254713.

  • Descending into /fts/dlqtffw, there is one file with a size of 73533.

If that was the extent of the filesystem tree, then:

  • /fts/dlqtffw has a size of 73533.

  • /fts has a size of 73533 + 254713 == 328246

  • / has a size of 328246 + 16394 + 105891 == 450531

The Approach

This problem suggests a recursive algorithm.

  • Read the input into an array-like structure.

  • Maintain an array to store directory sizes.

  • Analyze filesystem data

    • For each line:

      • $ cd .. - Add whatever the current directory size is to the array of directory sizes.

      • $ cd _ - Entering a new directory; call the filesystem analyzer, and add its return value to the current directory's size.

      • $ ls - This command is irrelevant; it can be ignored.

      • dir _ - This command is irrelevant; only directories that are entered via cd matter, so this can be ignored.

      • 12345 foo.xyz - Add anything that starts with a number to the size of the current directory

  • Search the directory size array for all elements under 100000, and keep the smallest.

This particular problem comes together in a surprisingly similar way across all three languages.

Ruby solution

# frozen_string_literal: true

class NoSpaceLeftOnDevice
  def initialize(filename)
    @data = parse_device_data(filename)
    @directory_sizes = []
  end

  def run
    analyze_directory_contents
    puts "Sum of all directory sizes less than 100000: #{small_directories.sum}"
  end

  def small_directories
    @directory_sizes.select { |size| size <= 100000 }
  end

  def analyze_directory_contents
    final_size = 0

    while line = @data.shift
      line_parts = line.split
      case line_parts
        in ['$', 'cd', '..']
          @directory_sizes << final_size
          return final_size
        in ['$', 'cd', _]
          final_size += analyze_directory_contents
        in [file_size, _]
          final_size += file_size.to_i
      end
    end

    final_size
  end

  def parse_device_data(filename)
    File.read(filename).lines
  end

end

NoSpaceLeftOnDevice.new(ARGV[0] || 'input.txt').run

Starting with version 2.7, Ruby supports pattern matching. It might be easy to confuse this feature as having something to do with regular expressions, but this feature is all about matching structured data. We can use it to make the handling of the filesystem data very clear and easy to reason about.

After reading the input data into a simple Array, the analyze_directory_contents method is called.

def analyze_directory_contents
  final_size = 0

  while line = @data.shift
    line_parts = line.split
    case line_parts
      in ['$', 'cd', '..']
        @directory_sizes << final_size
        return final_size
      in ['$', 'cd', _]
        final_size += analyze_directory_contents
      in [file_size, _]
        final_size += file_size.to_i
    end
  end

  final_size
end

This method operates on the @data instance variable, shifting one line off the beginning of the @data variable and then splitting it by whitespace. It then uses it as the argument in a pattern matching case statement.

In Ruby, a when statement within a case is the standard case/=== comparison. However, when used with in, the argument is assumed to be a pattern to match against. This task only has three patterns that have to be considered for this task.

  • ['$', 'cd', '..'] - This pattern is a literal match. If the first three elements passed into the case statement exactly match these three elements, then the code that follows:
    @directory_sizes << final_size return final_size

  • ['$', 'cd', _] - This pattern matches an array with three elements, where the first two are literal matches, and the third can be anything; it is assigned to the _ variable because the actual value of it is irrelevant to this task.

  • [file_size, _] - This pattern matches an array with two elements, assigning one to a variable, file_size, and another to the _ variable. The file_size is converted to an integer, and added to the final_size.

Ruby requires that pattern matches be exhaustive. This means that there will be a NoMatchingPatternError error something like this if a given term doesn't match a pattern.

day_7_1.rb:25:in `analyze_directory_contents': ["$", "cd", "/"] (NoMatchingPatternError)

This characteristic means that an array like ['$', 'ls'] or ['dir', 'ftbqmdl'] will both match to that last pattern, the [file_size, _]. It might be better to have a couple of additional lines that match those patterns, but do nothing with them, like this:

      in ['$', 'ls']
      in ['dir', _]

However, Ruby interprets any string that isn't a number as a zero, so if the execution falls through to there, any patterns which don't have a number attached won't affect the final outcome, so the above lines, while technically cleaner, aren't strictly necessary to get a correct answer.

There's only one bit left to get the answer that we need. All of the directory sizes that are collected into the @directory_sizes instance variable needs to be filtered to find only those directories which are at or under 100000 in size.

  def small_directories
    @directory_sizes.select { |size| size <= 100000 }
  end

This can be done with a simple select call on the array. Then the code sums those values for the answer.

  def run
    analyze_directory_contents
    puts "Sum of all directory sizes less than 100000: #{small_directories.sum}"
  end

Ruby provides all of the tools necessary to write this solution quickly and cleanly.

Crystal solution

class NoSpaceLeftOnDevice
  @data = Deque(String).new

  def initialize(filename)
    parse_device_data(filename)
    @directory_sizes = [] of Int32
  end

  def run
    analyze_directory_contents
    puts "Sum of all directory sizes less than 100000: #{small_directories.sum}"
  end

  def small_directories
    @directory_sizes.select { |size| size <= 100000 }
  end

  def analyze_directory_contents
    final_size = 0

    while line = @data.shift?
      line_parts = line.split
      case {line_parts[0], line_parts[1], line_parts[2]?}
      when {"$", "cd", ".."}
        @directory_sizes << final_size
        return final_size
      when {"$", "cd", _}
        final_size += analyze_directory_contents
      when {"$", "ls", _}
      when {"dir", _, _}
      else
        final_size += line_parts.first.to_i
      end
    end

    final_size
  end

  def parse_device_data(filename)
    File.open(filename).each_line { |line| @data << line }
  end
end

NoSpaceLeftOnDevice.new(ARGV[0]? || "input.txt").run

The Crystal version returns to the pattern of being nearly identical to the Ruby version, with a couple of small tweaks. The first is that the Ruby version uses an Array to store the filesystem data that is to be traversed.

This works, because Ruby supports Array#shift, so information can be removed from the front of an Array instead of popping it from the back. From the viewpoint of efficiency, this isn't a great solution, because Ruby arrays are built as LIFO (last-in-first-out) data structures. Removing data from their front requires the Array implementation to do a lot of extra work. Crystal's Array is built similarly, and it would be fine to use an Array with Crystal, just like with Ruby, but Crystal has another data structure that is part of its core libraries, the Deque. A Deque is used just like an Array, but internally, it is built to efficiently remove data from either end of the structure. This is a better match for what this program needs, so @data is declared as a Deque(String), and the file is read into it instead of into an Array.

Secondly, Crystal doesn't have the same sort of pattern-matching facilities that Ruby has, but it has, broadly speaking, some similar behaviors available.

def analyze_directory_contents
  final_size = 0

  while line = @data.shift?
    line_parts = line.split
    case {line_parts[0], line_parts[1], line_parts[2]?}
    when {"$", "cd", ".."}
      @directory_sizes << final_size
      return final_size
    when {"$", "cd", _}
      final_size += analyze_directory_contents
    when {"$", "ls", _}
    when {"dir", _, _}
    else
      final_size += line_parts.first.to_i
    end
  end

  final_size
end

Crystal's case statement works, superficially, like Ruby's does, using the === method for comparisons. However, when it is given a Tuple instead of an Array, the behavior becomes a little more like Ruby's pattern matching. It does not support the (very useful) feature that Ruby has, where local variables can be assigned within the matched pattern, nor does it allow variable length patterns -- they all have to match the length of the tuple given to the case statement -- but the _ character can be used as a placeholder that will match any value. This lets the case statement end up being fairly similar to the Ruby one.

Crystal does support the in keyword like Ruby does, for case statements, but with Crystal, in indicates that it is a compiler error for the case statement to not be exhaustive.

One other change from Ruby to draw your attention to is that the Crystal version has lines for the two types of input that are simply ignored in the Ruby version.

In Ruby, "$".to_i will return a 0. In Crystal, however, that is an ArgumentError.

icr:1> "$".to_i
Unhandled exception: Invalid Int32: "$" (ArgumentError)

So, to avoid either of those types of lines falling through to the catchall statement that handles the file size lines, they have to be consumed by matching when clauses first.

The remainder of the code is essentially identical to the Ruby code.

Rust solution

use std::collections::VecDeque;
use std::env;

struct NoSpaceLeftOnDevice {
    data: VecDeque<String>,
    directory_sizes: Vec<i32>,
}

impl NoSpaceLeftOnDevice {
    fn new(filename: &str) -> NoSpaceLeftOnDevice {
        let data = Self::parse_device_data(filename).unwrap();
        NoSpaceLeftOnDevice {
            data: data,
            directory_sizes: Vec::<_>::new(),
        }
    }

    fn parse_device_data(filename: &str) -> Option<VecDeque<String>> {
        let data = std::fs::read_to_string(filename).ok()?;
        Some(data.lines().map(String::from).collect())
    }

    fn run(&mut self) {
        self.analyze_directory_contents();

        println!(
            "Sum of all directory sizes less than 100000: #{}",
            self.small_directories().iter().sum::<i32>()
        );
    }

    fn small_directories(&self) -> Vec<i32> {
        self.directory_sizes
            .iter()
            .filter(|&size| size < &100000)
            .map(|size| *size)
            .collect::<Vec<i32>>()
    }

    fn analyze_directory_contents(&mut self) -> i32 {
        let mut final_size = 0;

        while let Some(line) = self.data.pop_front() {
            let line_parts = line.split(" ").collect::<Vec<&str>>();
            match line_parts.as_slice() {
                ["$", "cd", ".."] => {
                    self.directory_sizes.push(final_size);
                    return final_size;
                }
                ["$", "cd", _] => {
                    final_size += self.analyze_directory_contents();
                }
                ["$", "ls"] => {
                    // Irrelevant. Consume it and go on.
                }
                ["dir", _] => {
                    // Also irrelevant. Consume it and go on.
                }
                [file_size, _] => {
                    final_size += file_size.parse::<i32>().unwrap();
                }
                _ => {}
            }
        }
        final_size
    }
}

fn main() {
    let args: Vec<String> = env::args().collect();
    let mut filename = "input.txt";
    if let Some(arg) = args.get(1) {
        filename = arg;
    }

    NoSpaceLeftOnDevice::new(filename).run();
}

Rust is always more lines of code, but this particular program, if you put aside the basic syntactical differences, is remarkably close to the Ruby/Crystal solution.

Like Crystal, this version uses a deque structure, VecDec, instead of a plain Vec. Rust also makes it quite simple to read the input file data into this, instead of into a Vec.

    fn parse_device_data(filename: &str) -> Option<VecDeque<String>> {
        let data = std::fs::read_to_string(filename).ok()?;
        Some(data.lines().map(String::from).collect())
    }

The second part of the similarity lies with the use of pattern matching. Rust's match keyword is immensely powerful, and it is put to good use in solving this task.

fn analyze_directory_contents(&mut self) -> i32 {
    let mut final_size = 0;

    while let Some(line) = self.data.pop_front() {
        let line_parts = line.split(" ").collect::<Vec<&str>>();
        match line_parts.as_slice() {
            ["$", "cd", ".."] => {
                self.directory_sizes.push(final_size);
                return final_size;
            }
            ["$", "cd", _] => {
                final_size += self.analyze_directory_contents();
            }
            ["$", "ls"] => {}
            ["dir", _] => {}
            [file_size, _] => {
                final_size += file_size.parse::<i32>().unwrap();
            }
            _ => {}
        }
    }
    final_size
}

Like Crystal, Rust patterns can use _ to signify a wildcard match to a value that isn't going to be used. Also, like Crystal, Rust is not well pleased if it is asked to turn a string like $ or dir into an integer.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: InvalidDigit }', day_7_1.rs:60:60
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

So, like Crystal, the Rust pattern needs matches for those lines that are irrelevant and are to be ignored.

["$", "ls"] => {}
["dir", _] => {}

And like Ruby, Rust pattern matching can capture matched values to a locally scoped variable, which is used to capture the file sizes.

[file_size, _] => {
    final_size += file_size.parse::<i32>().unwrap();
}

Though unlike Ruby, pattern matching in Rust is always exhaustive, meaning that there needs to be a pattern (_ = {}) that is guaranteed to match anything that might fall through all of the other matches.

The rest may take a few more lines than the other implementations, but is either something that I have talked about during an earlier AoC article, or is conceptually much the same as the other two languages. Thanks to pattern matching, this task turned out to be as simple in Rust as in the other languages.

Part 2

Now you're ready to choose a directory to delete.

The task is to select a directory to delete in order to free up enough space on the filesystem to run an update. The total disk space available is 70000000 and the update requires at least 30000000 unused space. The current unused space is 21618835, which is not enough. The options for directories to delete and the amount of space they would free up are: directory e (584), directory a (94853), directory d (24933642), and directory / (48381165). Out of these options, select the smallest directory that would free up enough space to run the update. The total size of that selected directory should be returned.

This is straightforward enough. Find all of the directory sizes which are equal to or greater than the amount of space needed, and of those, select the smallest.

The Approach

  • Do everything up to and including filesystem analysis the same as Part 1.

  • After the filesystem analysis, calculate the amount of needed space.
    30000000 - (70000000 - space_used)

  • Select all directories greater than or equal to the space_needed, and then find the smallest one of those.

Ruby solution

# frozen_string_literal: true

class NoSpaceLeftOnDevice
  def initialize(filename)
    @data = parse_device_data(filename)
    @directory_sizes = []
  end

  def run
    space_used = analyze_directory_contents
    space_needed = 30000000 - (70000000 - space_used)

    puts "The size of the smalled directory that can be deleted: #{small_directory_to_delete(space_needed)}"
  end

  def small_directory_to_delete(space_needed)
    @directory_sizes.select  {|size| size >= space_needed}.min
  end

  def analyze_directory_contents
    final_size = 0

    while line = @data.shift
      line_parts = line.split
      case line_parts
        in ['$', 'cd', '..']
          @directory_sizes << final_size
          return final_size
        in ['$', 'cd', _]
          final_size += analyze_directory_contents
        in [file_size, _]
          final_size += file_size.to_i
      end
    end

    final_size
  end

  def parse_device_data(filename)
    File.read(filename).lines
  end

end

NoSpaceLeftOnDevice.new(ARGV[0] || 'input.txt').run

This variation from Part 1 only changes a few lines.

  def run
    space_used = analyze_directory_contents
    space_needed = 30000000 - (70000000 - space_used)

    puts "The size of the smalled directory that can be deleted: #{small_directory_to_delete(space_needed)}"
  end

  def small_directory_to_delete(space_needed)
    @directory_sizes.select  {|size| size >= space_needed}.min
  end

The code for Part 1 already collects all of the information that is needed for this task. All that really changes is the search through the @directory_sizes. I won't belabor this. Ruby's expressiveness makes this look very much like a coded version of the list in The Approach.

Crystal solution

class NoSpaceLeftOnDevice
  @data = Deque(String).new

  def initialize(filename)
    parse_device_data(filename)
    @directory_sizes = [] of Int32
  end

  def run
    space_used = analyze_directory_contents
    space_needed = 30000000 - (70000000 - space_used)

    puts "The size of the smalled directory that can be deleted: #{small_directory_to_delete(space_needed)}"
  end

  def small_directory_to_delete(space_needed)
    @directory_sizes.select { |size| size >= space_needed }.min
  end

  def analyze_directory_contents
    final_size = 0

    while line = @data.shift?
      line_parts = line.split
      case {line_parts[0], line_parts[1], line_parts[2]?}
      when {"$", "cd", ".."}
        @directory_sizes << final_size
        return final_size
      when {"$", "cd", _}
        final_size += analyze_directory_contents
      when {"$", "ls", _}
      when {"dir", _, _}
      when {_, _, _}
        final_size += line_parts.first.to_i
      end
    end

    final_size
  end

  def parse_device_data(filename)
    File.open(filename).each_line { |line| @data << line }
  end
end

NoSpaceLeftOnDevice.new(ARGV[0]? || "input.txt").run

The changes for the Crystal version are identical to the changes for the Ruby version.

Rust version

use std::collections::VecDeque;
use std::env;

struct NoSpaceLeftOnDevice {
    data: VecDeque<String>,
    directory_sizes: Vec<i32>,
}

impl NoSpaceLeftOnDevice {
    fn new(filename: &str) -> NoSpaceLeftOnDevice {
        let data = Self::parse_device_data(filename).unwrap();
        NoSpaceLeftOnDevice {
            data: data,
            directory_sizes: Vec::<_>::new(),
        }
    }

    fn parse_device_data(filename: &str) -> Option<VecDeque<String>> {
        let data = std::fs::read_to_string(filename).ok()?;
        Some(data.lines().map(String::from).collect())
    }

    fn run(&mut self) {
        let space_used = self.analyze_directory_contents();
        let space_needed = 30000000 - (70000000 - space_used);

        println!(
            "The size of the smalled directory that can be deleted: #{}",
            self.small_directory_to_delete(space_needed)
        );
    }

    fn small_directory_to_delete(&self, space_needed: i32) -> i32 {
        self.directory_sizes
            .iter()
            .filter(|&size| size >= &space_needed)
            .map(|size| *size)
            .min()
            .unwrap()
    }

    fn analyze_directory_contents(&mut self) -> i32 {
        let mut final_size = 0;

        while let Some(line) = self.data.pop_front() {
            let line_parts = line.split(" ").collect::<Vec<&str>>();
            match line_parts.as_slice() {
                ["$", "cd", ".."] => {
                    self.directory_sizes.push(final_size);
                    return final_size;
                }
                ["$", "cd", _] => {
                    final_size += self.analyze_directory_contents();
                }
                ["$", "ls"] => {}
                ["dir", _] => {}
                [file_size, _] => {
                    final_size += file_size.parse::<i32>().unwrap();
                }
                _ => {}
            }
        }
        final_size
    }
}

fn main() {
    let args: Vec<String> = env::args().collect();
    let mut filename = "input.txt";
    if let Some(arg) = args.get(1) {
        filename = arg;
    }

    NoSpaceLeftOnDevice::new(filename).run();
}

Again, while slightly more lines of code are involved, the changes for the Rust version are basically the same as the changes for the other two languages.

Conclusion

Pattern-matching facilities are powerful, and very useful. Today's examples barely scratch the surface of what can be done with them, and while it is true that nothing that was done with pattern matching in the case would have been difficult to do with a traditional case statement, or even with a few if/elsif type statements, the use of pattern matching made the code easy to write, and also easy to look at and reason about.

That ability to understand what a piece of code is doing at a glance shouldn't be underemphasized. In real-world code, everyone who comes after you, who has to understand or maintain what you did, will be thankful if you make choices to favor clarity and readability in your code. Today, with AoC Day 7, patterns did that for us, and they made this task pretty simple.