Does HTML structure matter for SEO?

HTML structure matters for SEO, but not in the ways most might think. Read on to uncover the real role of HTML for SEO.

Chat with SearchBot

In case you missed the latest podcast episode of Search Off the Record, Google’s Gary Ilyes stirred up some controversy when he made a comment about HTML structure not mattering much for SEO.

He later clarified on Linkedin that “not mattering much” doesn’t mean “doesn’t matter at all.” Still, that didn’t stop any of the SEO controversy.

Being unable to avoid controversy, I couldn’t help but jump in – in an attempt to clarify a few common points and misconceptions I keep seeing pop up.

So, does HTML structure matter for SEO? 

The answer is: It depends. 

When Illyes was talking about HTML structure, he was likely referring to some of the things SEOs like to obsess about:

  • The number of H1 tags on a page.
  • The order of H tags.
  • Whether something is a <b> or a <strong> tag.
  • The use of tables versus CSS for styling.
  • How high up in the source code text appears. 

These are all things I’ve seen SEOs discuss over the years, and while some of them mattered in the old days of SEO, that’s not how things work anymore.

Before diving into when HTML does and doesn’t matter for SEO, we need to get some caveats out. 

HTML structure 100% still matters for accessibility. 

Accessibility is not a direct ranking factor, though, so it’s a bit outside the scope of this article.

I will note, as others pointed out on X, that if your site isn’t accessible, it is less likely people will link to it or click on it in the future, so that can potentially affect your SEO rankings.

The recently updated Google SEO Starter Guide even specifically mentions heading tags and accessibility vs. SEO:

“Having your headings in semantic order is fantastic for screen readers, but from Google Search perspective, it doesn’t matter if you’re using them out of order. The web in general is not valid HTML, so Google Search can rarely depend on semantic meanings hidden in the HTML specification.

There’s also no magical, ideal amount of headings a given page should have. However, if you think it’s too much, then it probably is.”

But what about the rest of HTML structure?

The main issue here is our mental model of how search engines work. For most people, that model hasn’t changed since the ‘90s when search engines were mostly all lexical search. That is to say, finding the document with the most mentions of the term. 

Those search engines had scoring functions that gave extra weight to occurrences of the term in bold and counted an H1 more than an H2, etc.

Unfortunately for our mental model, search has moved away from the lexical approach and more toward a semantic approach.

In semantic search, the content is converted to vectors and algorithms like BERT, RankBrain, etc., are used to interpret the “meaning” of the query and the content, not just looking at what words it contains. In the process of converting the content to vectors, most of the HTML is lost. 

It’s not just vectors that come into play here but also rendering. Before search engines could render JavaScript and examine the DOM, they had to rely on HTML hints – but those days are gone.

Just as they can use algorithms like passage-bert to identify the most relevant snippet on the page, they can also use various algorithms to determine the main heading – even if it’s not in the <h1> tag. 

Sure, <h1> is a hint here – but so is font size, placement relevant to the content, and the actual sentence itself. We’ve all seen so many SEOs mark up a tiny bit of the navigation with an H1 despite having a giant 30-point text in the middle of the screen that’s just a <span> tag. 

In the old days, search engines would struggle here, but these days, they can more often than not correctly identify that giant <span> tag as the “heading” of the page. 

That doesn’t mean you shouldn’t use proper H tags and nested elements. Remember, accessibility still matters to give the search engines a hint. It’ll be cleaner, easier, more accessible and just overall better if you do it. I’m just saying search engines aren’t stuck to relying on the markup.

Another misconception is the multiple H1 tags. This is one of my biggest pet peeves.

With the introduction of HTML5 and various elements, it’s completely normal (and, in some accessibility cases, required) to have multiple H1 tags on a page. This isn’t something that will affect your SEO efforts. (Unless you’re keyword stuffing and marking up everything as an H1, which may trip some spam flags.)

So, what does a search engine do? (I’ll over-simplify here as I could go in-depth on information retrieval and would love to do that over beers anytime.)

Simply put:

  • They will detect the title tag, key headings (that may or may not be H1, H2, etc), and body copy.
  • They’ll then run both lexical (e.g., BM25) and semantic (e.g., cosine similarity) measures to determine those sections’ relevance to the query before feeding them all into a machine learning algorithm and ranker. 

The takeaway is that they most likely no longer really care if it’s an H1 or H2 – just that their algorithm identified it as a “heading” of the page. 

The same goes for bold text, span and div tags, etc. It’s all about whether the algorithm (e.g., BERT) says it’s relevant to the query. 

Get the daily newsletter search marketers rely on.


So, where does HTML structure matter? 

HTML structure can actually make or break your SEO strategy in lots of instances. For example, putting your canonical tag in the <body> instead of the <head> won’t get seen. 

Likewise, if you put a <div> in your <head> tag, then Googlebot’s version of Chrome will assume you forgot to close the head and start the body and do it for you, potentially moving some of your important SEO tags into the body where they will be ignored. 

You won’t believe how often I see this. It just takes one person to accidentally paste code into the wrong place in Google Tag Manager to break your whole site. For this reason alone, I tell clients to make sure their SEO tags are all higher up in the <head> than any other tags.

Other HTML coding techniques can harm SEO, too.

For example, if instead of using an <a> tag with href attribute, your site has a <span> with an onclick= event, search engines will not count that as a link, even though users won’t tell the difference. It’s got some accessibility issues, too, so please stop doing that.

When it comes to images, search engines require an <img> tag with a src= attribute. You’d be surprised how many lazy loading plugins omit the src= in favor of srcset=, which, as of my latest testing, works in modern browsers but isn’t treated as an “image” by Google for image ranking. 

I don’t think any of these examples were what Illyes meant when he talked about HTML structure. I believe he was referencing the common arguments of heading nesting, bold tags, etc. 

TL;DR

Should I worry about my use of H1s, H2s etc?

Yes, always, but not for SEO. Mark stuff up in a way that’s accessible and makes sense for users. Don’t stress over forcing in that <h1> tag that’s styled to look like regular text. 

Should I validate my HTML?

Yes, but not for SEO rankings. Valid HTML isn’t a ranking factor, but it will help prevent technical issues affecting SEO and potentially lessen your accessibility work. I’m a huge fan of the W3C Validator

Does HTML structure matter for SEO?

It depends. (Sorry, couldn’t resist!) If your markup causes stuff to be inaccessible or not seen, yes, it matters a ton. If you’re hoping to get a ranking boost by re-ordering some headings or bolding some text, it’s likely not going to happen. 


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Ryan Jones
Contributor
Ryan Jones is a Senior Vice President of SEO at Razorfish where he co-leads the SEO practice. Prior to being an SEO, Ryan worked as a software engineer. His vast technical and marketing experience gives him a unique lens into SEO and technical problems as well as the ability to rapidly prototype or speak to various stakeholders. Ryan has created several industry tools including SEOdataviz.com and serverheaders.com as well as the satirical blog WTFSEO.com. When he's not doing SEO Ryan enjoys playing hockey, softball, golf, and attempting to take over the world - which he would have already gotten away with had it not been for those meddling kids and their dog.

Get the must-read newsletter for search marketers.