The improvements to `byte_pair_merge` are:
- Changing the `parts` vector to avoid repetition of data.
This vector used to store ranges for which the invariant `parts[i].end
== parts[i + 1].start` holds, which makes the vector twice as big as it
needs to be.
Keeping this vector small improves CPU-cache efficiency.
- Using `usize::MAX` as a sentinel in lieu of `Optional` for the
computation of the minimum rank.
This change removes branching from the loop to compute the minimum rank,
generating assembly that uses conditional moves instead.
Ideally, we could keep the `Optional` and inform it of the sentinel much
like `Optional<NonZeroUsize>`. As far as I could tell, specifying custom
sentinels for `Optional` has an old Rust
[RFC](https://github.com/rust-lang/rfcs/pull/41) that has stalled, so we
don't get to have nice things.
- Minimizing the number of lookups into `ranks` by looking up ranks once
and iteratively updating them after each merge.
This reduces the number of rank lookups from `n*m` to `n + O(m)`