Solving 3-SAT: from brute force to local heuristics (GSAT, WalkSAT, ILS)
TLDR
- This is the 3-SAT problem: find an assignment of boolean variables that satisfies all clauses.
- I implemented multiple algorithms: brute-force search, hill climbing, an improved hill-climbing variant with dynamic weighting, GSAT, WalkSAT, and Iterated Local Search.
- My focus was on how algorithms behave when they hit local optima, and how much random restarts, noise, and perturbations help.
Motivation
3-SAT is one of the classic combinatorial optimization problems: easy to explain, but it quickly becomes computationally hard. I like it as a testbed for different search strategies:
- What happens if I just climb the fitness as long as it improves?
- What does escaping a local optimum via randomness look like?
- How much do restarts and heuristics like GSAT/WalkSAT help?
This project was a practical way for me to connect theory (NP-completeness, search space) with implementation and experimentation.
Problem
In 3-SAT, we have boolean variables and a set of clauses, where each clause is a disjunction of exactly three literals (e.g. (x1 ∨ ¬x4 ∨ x7)).
Goal: find an assignment x ∈ {0,1}^n such that all clauses are satisfied.
This is an NP-complete problem → brute force works as a baseline for small instances, but scaling quickly becomes impossible.
Input format and evaluation
Input instances are in DIMACS CNF format (e.g. uf20-01.cnf).
Internally, I parsed clauses into a list of literals in the form (variable_index, required_value), where:
- positive literal
x_imaps to(i, 1) - negative literal
¬x_imaps to(i, 0)
Fitness (score) in most algorithms is:
- the number of satisfied clauses
- the goal is to reach
satisfied_count = clause_count
Approach and algorithms
1) Brute-force search (baseline)
It iterates over all assignments 2^n and checks whether the formula satisfies all clauses.
As the name suggests, this is exhaustive and exponential.
An example where an exact method stops making sense (for n=20 there are already over a million possibilities), so we have to turn to heuristics.
- Pros: always finds a solution if one exists
- Cons: explodes with
n(atn=50it’s practically infeasible)
I used it as a reference and a sanity check for small instances.
2) Hill climbing
I start from a random assignment and look at the neighborhood obtained by flipping one variable.
- pick the neighbor with the highest number of satisfied clauses
- if there is no better neighbor → local optimum (stagnation)
This approach can be fast, but it’s sensitive to local optima.
3) “Weighted” hill climbing (dynamic focus on hard clauses)
This is a hill-climbing variant where fitness is not just “number of satisfied clauses”, but includes a correction that favors satisfying clauses that are often problematic.
Intuition:
- if a clause frequently remains unsatisfied, I want the algorithm to start “caring more” about it
- this mitigates the typical issue where hill climbing gets stuck because it keeps optimizing the “easy-to-satisfy” parts
4) GSAT
A classic local SAT algorithm:
- in each iteration, flip the variable that reduces the number of unsatisfied clauses the most
- run with multiple restarts (tries) and a limited number of flips
GSAT is often a good compromise: simple, but much more robust than pure hill climbing.
5) WalkSAT (random walk + greedy)
This is one of my favorite heuristics because it clearly shows why “a bit of noise” helps:
- pick an unsatisfied clause
- with probability
P, flip a random variable from that clause (random walk) - otherwise, flip the variable that gives the best local improvement (greedy)
This gives the algorithm a mechanism to escape local optima without overcomplicating things.
6) Iterated Local Search (ILS)
Idea:
- run local search (hill climbing) → get a locally good solution
- apply a perturbation (flip some variables)
- run local search again
- if it’s better, accept it (or keep the best)
This is basically hill climbing + a controlled reset, and it often works surprisingly well.
Results (uf50-010)
I ran each algorithm 20 times (different random starts). The table shows wall-clock time for a single run.
| Algorithm | Success | Median (s) | Average (s) | Min-Max (s) |
|---|---|---|---|---|
| Hill climbing | 6/20 | 30.05 | 37.54 | 0.37-92.29 |
| Weighted hill | 20/20 | 13.21 | 16.68 | 0.22-64.04 |
| GSAT | 20/20 | 18.68 | 25.79 | 0.97-96.67 |
| WalkSAT | 20/20 | 1.47 | 2.50 | 0.09-13.05 |
| ILS | 20/20 | 8.64 | 42.64 | 1.02-246.29 |
Note for hill climbing: when it succeeds, the median is 4.56 s; when it fails, the median is 56.51 s.
Observations
- WalkSAT is by far the fastest in this experiment: low median and a relatively “tight” time distribution. In practice, this looks like the best speed/reliability tradeoff on UF50.
- Basic hill climbing is “hit or miss”. If it happens to start in a good basin of attraction, it solves quickly; if it gets stuck, it can spend a lot of time without a solution.
- Weighted hill nicely shows how a small change in the score function can change stability: here it’s 20/20 successes, but still with visible variance.
- GSAT is reliable, but on average slower than Weighted hill and noticeably slower than WalkSAT. I’d say that’s the cost of more systematic flip selection.
- ILS has a good median, but a heavy tail (a few runs take extremely long). That usually means it would be worth tuning the perturbation further (how many bits to flip, how often, when to reset).
An additional detail: the found assignments differ (there are multiple satisfying solutions), which is visible from different 50-bit outputs across runs.
Conclusion
For this instance and this implementation, WalkSAT is the most practical choice for me: fast and stable across repetitions. Hill climbing without a mechanism to escape local optima isn’t reliable enough, while ILS can be great, but needs tuning to avoid extreme “slow” cases.
If I expand this further, my next steps would be:
- automated parameter tuning (
P, number of flips, perturbation rate), - measuring across more UF instances (not only
uf50-010), - and comparing against a real SAT solver as a baseline (just to get a sense of the gap).