Solving 3-SAT: from brute force to local heuristics (GSAT, WalkSAT, ILS)

3-SATlocal searchheuristicsGSATWalkSAT

TLDR

This is the 3-SAT problem: find an assignment of boolean variables that satisfies all clauses.
I implemented multiple algorithms: brute-force search, hill climbing, an improved hill-climbing variant with dynamic weighting, GSAT, WalkSAT, and Iterated Local Search.
My focus was on how algorithms behave when they hit local optima, and how much random restarts, noise, and perturbations help.

Motivation

3-SAT is one of the classic combinatorial optimization problems: easy to explain, but it quickly becomes computationally hard. I like it as a testbed for different search strategies:

What happens if I just climb the fitness as long as it improves?
What does escaping a local optimum via randomness look like?
How much do restarts and heuristics like GSAT/WalkSAT help?

This project was a practical way for me to connect theory (NP-completeness, search space) with implementation and experimentation.

Problem

In 3-SAT, we have boolean variables and a set of clauses, where each clause is a disjunction of exactly three literals (e.g. (x1 ∨ ¬x4 ∨ x7)).

Goal: find an assignment x ∈ {0,1}^n such that all clauses are satisfied.

This is an NP-complete problem → brute force works as a baseline for small instances, but scaling quickly becomes impossible.

Input format and evaluation

Input instances are in DIMACS CNF format (e.g. uf20-01.cnf).

Internally, I parsed clauses into a list of literals in the form (variable_index, required_value), where:

positive literal x_i maps to (i, 1)
negative literal ¬x_i maps to (i, 0)

Fitness (score) in most algorithms is:

the number of satisfied clauses
the goal is to reach satisfied_count = clause_count

Approach and algorithms

1) Brute-force search (baseline)

It iterates over all assignments 2^n and checks whether the formula satisfies all clauses. As the name suggests, this is exhaustive and exponential. An example where an exact method stops making sense (for n=20 there are already over a million possibilities), so we have to turn to heuristics.

Pros: always finds a solution if one exists
Cons: explodes with n (at n=50 it’s practically infeasible)

I used it as a reference and a sanity check for small instances.

2) Hill climbing

I start from a random assignment and look at the neighborhood obtained by flipping one variable.

pick the neighbor with the highest number of satisfied clauses
if there is no better neighbor → local optimum (stagnation)

This approach can be fast, but it’s sensitive to local optima.

3) “Weighted” hill climbing (dynamic focus on hard clauses)

This is a hill-climbing variant where fitness is not just “number of satisfied clauses”, but includes a correction that favors satisfying clauses that are often problematic.

Intuition:

if a clause frequently remains unsatisfied, I want the algorithm to start “caring more” about it
this mitigates the typical issue where hill climbing gets stuck because it keeps optimizing the “easy-to-satisfy” parts

4) GSAT

A classic local SAT algorithm:

in each iteration, flip the variable that reduces the number of unsatisfied clauses the most
run with multiple restarts (tries) and a limited number of flips

GSAT is often a good compromise: simple, but much more robust than pure hill climbing.

5) WalkSAT (random walk + greedy)

This is one of my favorite heuristics because it clearly shows why “a bit of noise” helps:

pick an unsatisfied clause
with probability P, flip a random variable from that clause (random walk)
otherwise, flip the variable that gives the best local improvement (greedy)

This gives the algorithm a mechanism to escape local optima without overcomplicating things.

6) Iterated Local Search (ILS)

Idea:

run local search (hill climbing) → get a locally good solution
apply a perturbation (flip some variables)
run local search again
if it’s better, accept it (or keep the best)

This is basically hill climbing + a controlled reset, and it often works surprisingly well.

Results (uf50-010)

I ran each algorithm 20 times (different random starts). The table shows wall-clock time for a single run.

Algorithm	Success	Median (s)	Average (s)	Min-Max (s)
Hill climbing	6/20	30.05	37.54	0.37-92.29
Weighted hill	20/20	13.21	16.68	0.22-64.04
GSAT	20/20	18.68	25.79	0.97-96.67
WalkSAT	20/20	1.47	2.50	0.09-13.05
ILS	20/20	8.64	42.64	1.02-246.29

Note for hill climbing: when it succeeds, the median is 4.56 s; when it fails, the median is 56.51 s.

Observations

WalkSAT is by far the fastest in this experiment: low median and a relatively “tight” time distribution. In practice, this looks like the best speed/reliability tradeoff on UF50.
Basic hill climbing is “hit or miss”. If it happens to start in a good basin of attraction, it solves quickly; if it gets stuck, it can spend a lot of time without a solution.
Weighted hill nicely shows how a small change in the score function can change stability: here it’s 20/20 successes, but still with visible variance.
GSAT is reliable, but on average slower than Weighted hill and noticeably slower than WalkSAT. I’d say that’s the cost of more systematic flip selection.
ILS has a good median, but a heavy tail (a few runs take extremely long). That usually means it would be worth tuning the perturbation further (how many bits to flip, how often, when to reset).

An additional detail: the found assignments differ (there are multiple satisfying solutions), which is visible from different 50-bit outputs across runs.

Conclusion

For this instance and this implementation, WalkSAT is the most practical choice for me: fast and stable across repetitions. Hill climbing without a mechanism to escape local optima isn’t reliable enough, while ILS can be great, but needs tuning to avoid extreme “slow” cases.

If I expand this further, my next steps would be:

automated parameter tuning (P, number of flips, perturbation rate),
measuring across more UF instances (not only uf50-010),
and comparing against a real SAT solver as a baseline (just to get a sense of the gap).

← Back Repo →