Thanks to Amir Livne Bar-on, who implemented this variant from a
description posted in the Mastodon thread following up my recent blog
post about loop-finding.
The revised algorithm has the same asymptotic complexity as the one I
already had. But it has better constant factors, and less code: it can
run in a single depth-first pass over the graph instead of three
separate passes, and it needs to store fewer variables per vertex.
This version relies on the insight that if you DFS over an undirected
graph and imagine it constructing a rooted spanning forest with each
component's tree rooted at whatever vertex you started that component
from, then every edge that the DFS visits without making it part of
the spanning forest must join a vertex to one of its direct ancestors
in that component's tree.
(Because the other options are that it joins the DFS's current vertex
to one the search hasn't visited at all yet – in which case the DFS
_would_ follow it, and make it a forest edge after all. Or else it
joins this vertex to a cousin in an earlier finished subtree – but
then when the DFS processed that subtree, it would have explored the
same edge in the other direction, and added our current vertex to that
subtree, which by assumption it didn't.)
Hence, instead of assigning every vertex a distinct integer label and
calculating the min/max label reachable from each subtree, we can
instead assign each vertex its tree depth, and simply calculate the
minimum _depth_ of vertex reachable from each subtree: if a subtree
starting at depth D can reach a vertex at depth <D, it's because
there's one of those non-tree edges to a vertex outside the subtree,
so the tree edge entering the subtree isn't a bridge.
And since every non-tree edge must point to a vertex we've already
seen (and hence assigned a depth to), this can be done in the same
pass as calculating the depths in the first place - and we don't even
need to _store_ the spanning forest we generate.
This week I expanded that comment into a blog post:
https://www.chiark.greenend.org.uk/~sgtatham/quasiblog/findloop/
which improves on the comment in three ways:
1. diagrams
2. adds a further reason why the footpath-dsf algorithm was
unsatisfactory, pointed out by a Mastodon comment after I
published the original version of the blog post
3. adds the punchline that the loop tracing approach _could_ have
been made to work after all!
So I've deleted the comment and replaced it with a link to the article.
This one tells you if a graph edge _is_ a bridge (i.e. it has inverted
sense to the existing is_loop_edge query). But it also returns
auxiliary data, telling you: _if_ this edge were removed, and thereby
disconnected some connected component of the graph, what would be the
sizes of the two new components?
The data structure built up by the algorithm very nearly contained
enough information to answer that already: because we assign
sequential indices to all the vertices in a traversal of our spanning
forest, and any bridge must be an edge of that forest, it follows that
we already know the size of _one_ of the two new components, just by
looking up the (minindex,maxindex) values for the vertex at the child
end of the edge.
To determine the other subcomponent's size, we subtract that from the
size of the full component. That's not quite so easy because we don't
already store that - but it's trivial to annotate every vertex's data
with a pointer back to the topmost node of the spanning forest in its
component, and then we can look at the (minindex,maxindex) pair of
that. So now we know the size of the original component and the size
of one of the pieces it would be split into, so we can just subtract.
This is the main bulk of this boolification work, but although it's
making the largest actual change, it should also be the least
disruptive to anyone interacting with this code base downstream of me,
because it doesn't modify any interface between modules: all the
inter-module APIs were updated one by one in the previous commits.
This just cleans up the code within each individual source file to use
bool in place of int where I think that makes things clearer.
This commit removes the old #defines of TRUE and FALSE from puzzles.h,
and does a mechanical search-and-replace throughout the code to
replace them with the C99 standard lowercase spellings.
This shouldn't be a disruptive change at all: findloop_run and
findloop_is_loop_edge now return bool in place of int, but client code
should automatically adjust without needing any changes.
In the course of another recent project I had occasion to read up on
Tarjan's bridge-finding algorithm. This analyses an arbitrary graph
and finds 'bridges', i.e. edges whose removal would increase the
number of connected components. This is precisely the dual problem to
error-highlighting loops in games like Slant that don't permit them,
because an edge is part of some loop if and only if it is not a
bridge.
Having understood Tarjan's algorithm, it seemed like a good idea to
actually implement it for use in these puzzles, because we've got a
long and dishonourable history of messing up the loop detection in
assorted ways and I thought it would be nice to have an actually
reliable approach without any lurking time bombs. (That history is
chronicled in a long comment at the bottom of the new source file, if
anyone is interested.)
So, findloop.c is a new piece of reusable library code. You run it
over a graph, which you provide in the form of a vertex count and a
callback function to iterate over the neighbours of each vertex, and
it fills in a data structure which you can then query to find out
whether any given edge is part of a loop in the graph or not.