Find Broken Links On a Website

January 29, 2016

This website has zero broken links, including links to external websites. At least the last time I checked. I used to handle some of this on the back-end automatically, but I pretty much rely on the following method because it catches all cases. It's actually surprising how many websites I link to that end up dropping off the face of the Internet. The Internet is a fickle beast.

I published a perl script awhile back, which I sometimes update as I find issues, that I use to crawl a a website starting with a root URL. It crawls all internal links on the website and optionally external ones to make sure that they exist and can be properly resolved.

To do it, I execute the script and wait a long, long time for it to complete.

./ -v -x | tee results.txt

The resulting output looks something like this.

GOOD 1 <img> file/483/268/full/circuit.jpg
GOOD 1 <a>
GOOD 1 <img> /blog/file/472/268/full/DSC02163.JPG
GOOD 1 <img> /blog/file/474/268/full/DSC02158.JPG
GOOD 1 <img> /blog/file/475/268/full/DSC02161.JPG
GOOD 1 <img> /blog/file/476/268/full/DSC02155.JPG
GOOD 1 <a> /blog/trait

This shows it grabbing a page, prefixed with ROOT: and then checking all of the links it finds in that page. It checks images, css files, basically any link - not just hrefs.

Related Posts