SEO for Developer

Jerry

23 Apr, 2022

The first rule of search engine optimization is to create really good content. If human beings don't want to engage with your content then google doesn't want to. Either we live in the age of machine learning and quantum computing, you can't just stuff a bunch of keywords into a page and expect to do well. When Google first came about in the late 90s, it was based on an algorithm called page rank which weighted relevance and search ranking based on the number of inbound links that a site had. People quickly learned how to exploit the algorithm by spamming backlinks all over the internet to increase the site's page rank because a high ranking in Google can literally be worth millions of dollars.

It brought us an entire industry of SEO experts the good guys wear white hats the hackers wear black hats but the most effective ones wear grey hats. But some say it's a dying industry you all want to see a dead body because it's becoming harder and harder to manipulate Google's technology. There are over 200 factors that go into a site's ranking that is geared mostly towards how useful a user found your site did they immediately bounce by clicking the back button or did they dwell on the page for a long time and click other links.

Absorbing all kinds of useful content is king but the second rule of SEO is to render HTML. That can be reliably understood by bots. Your main content goes inside the body tags when google crawls your site. It will use semantic HTML elements to understand the content on the page. You might put your main content in an article tag then put your most important keywords and headings or <h> tags to signal what your page is about. Furthermore, your HTML should be accessible by using the alt attribute and images, and Aria tags wherever needed to make your site usable on assistive devices. In the head of the document, we have metadata that's not directly shown to the end-user but bots can use this data to further understand the page and format the actual appearance of your search engine.

The third rule of SEO is to get your fully rendered HTML loaded fast. if you have megabytes of blocking images styles and javascript. Both users and bots will pass on your site. But going fast is easier said than done. That's why today we're going to look at the many different strategies we have to render HTML and how they impact search engine optimization.

The three most important rules for SEO in my opinion are creating awesome content, rendering properly formatted HTML, and loading your HTML quickly. The first rule is very subjective and depends entirely on your audience but the general goal is that when someone clicks on a link to your site from a search engine ranking page they should engage with your site as long as possible. There are a few metrics that you'll want to be aware of here. The first one is the click-through rate or CTR which defines how likely a user is to click on your link when displayed on a search engine ranking page or SERP. The higher the CTR the better and that usually means you have a very relevant title and description. If a user clicks on your link and then immediately clicks the back button, that's called a bounce and the higher your bounce rate is the less likely your site is to rank well in the long term because apparently, the content on the page is not very relevant. If the user does stay on the page, Google will keep track of the dwell time which is the amount of time they spend there before clicking back to the search results. The longer the dwell time, the better. However, the best possible thing that can happen is that the user never clicks back. Their session will last forever and they'll never need to go to another website ever again. That doesn't happen very often so what you keep track of is the average session duration and the average number of pages viewed per session. These are metrics that you want to maximize. There's no absolute rule for creating engaging content but the first thing the user sees should hook them in to want to read more. If you look at something like BuzzFeed, all you have to do is put an animated Gif at the top then maybe a few more in the body and you should be good.

Let's move on to rule two where we talk about the actual structure of the HTML. I'll be using site fireship.io as an example. On a lesson or article page, you can right-click and hit inspect element or hit "Control + Shift + I". This will bring up the elements tab in Chrome Devtools showing you the fully rendered HTML markup. We have a head and a body. Let's go ahead and open up the body finding the main element. Then, inside the main element, you'll notice we have an article. An article element has semantic meaning and although it will never be seen by the end-user it tells the search engine here is the main content of the page. In addition, you'll notice a couple of extra attributes here. One is item scope and the other is an item type.

As a schema.org article, it's totally optional and whether or not it will improve your search engine ranking is debatable. But what schema.org allows you to do is define a bunch of metadata about the actual content on your page making it easier for search engines to interpret. It's especially powerful if your content is something like a recipe or a starred review because Google can then take the schema data and format it properly on a SERP page. In this case, we have a bunch of metadata that make up an article. And one thing that is known to improve search ranking is when an article is written by a known author. Further down the HTML tree here, you'll notice we have an item prop of the author which points to the author's page that link goes to another page on fireship.io, and on that page, we also see an article element this time with an item type of a schema.org author along with a bunch of links that point to authoritative sites for that author.

Outbound links on a page are really important because they further signal what the page is about. In this case, Google will first crawl the article then crawl the author's page then crawl these other sites to understand who that author is. A good strategy is to use outbound links to other really good sites that are related to the content on a given page. In addition to schema.org, there are other ways you can add metadata to your content and this can be very important for SEO and also accessibility.

One of the most fundamental techniques is to add an alt attribute to images which is basically just some text that describes the image. This metadata can be used by search engines and also by screen readers for those with disabilities.

For other elements that are a little more complicated like a progress bar, for example, you can use aria attributes which stand for accessible rich internet applications. They help provide additional meaning to highly interactive widgets on the page. At this point, we've only been looking in the body of the document but the head of the document contains all kinds of useful metadata for SEO. Most importantly this is where you have the title. You should choose your title carefully because it's displayed on a CERTpage and will ultimately control your CTR rating.

In addition to the title, you may also want to have meta tags here which define things like the description featured image. Author canonical Url and stuff like that. These meta tags are also essential if you want your content to be shared on social media sites like Twitter or Facebook. When you post a hyperlink on social media it fetches that page and looks for the meta tags to understand what image and title to display there. If you want to see how your site's doing right now you can post a link into the Twitter card validator and it will tell you whether or not it can use your current meta tags so that gives you some things to think about when it comes to the actual structure of your HTML. But the bigger question is how do you render that HTML or in other words, what part of your tech stack is responsible for generating the actual Html markup that is received by a bot or end-user.

There are three fundamental ways to render HTML. The first one we'll look at is client-side rendering. If you're building an app with something like React or Angular. The default mode is client-side rendering or a single-page application.

On the initial page load, the user gets a shell of Html without any meaningful content. The javascript code then bootstraps and asynchronously fetches any additional data needed for the UI apps. This is great for interactivity because it gives the end-user an app-like feel similar to what you'd expect on IOS or Android. The problem is that the initial Html is just a shell. Search engines may have a hard time understanding and indexing it. If you take a link generated by Javascript from a single page application and post it on Twitter, you'll only see the initial shell. You won't see any additional meta tags that were generated by Javascript. After the fact, that's not great for social media but Google as a search engine is able to index client rendered apps but the reliability is questionable and personally, I wouldn't trust client rendering if SEO was a business-critical requirement.

So another option is to pre-render or statically generate HTML in advance. let's imagine your web app has a hundred different routes or pages. Instead of sending a shell down to the user, we could generate all the Html for those pages in advance and then upload the static files to a storage bucket that could be cached on a global CDN so the first thing the user sees is fully rendered content. The Javascript loads after that and makes the page fully interactive. That's great for SEO because bots get fully rendered Html and they can easily interpret the content on the page. It's also highly efficient because if you're fetching data from a database, you only have to do that once at build time then you can cache the page on a CDN and serve it to millions of people without having to re-fetch your data. The trade-off with this approach though is that the data in the pre-rendered content can become stale which means bots will be getting outdated information until you rebuild and redeploy the entire site. That's no big deal if you have a few hundred pages that don't change very often but if you have millions of pages with highly dynamic data then it doesn't really scale and that brings us to option number three.

Server-side rendering in this paradigm when the user makes a request to a page the HTML is generated on the server this is also great for SEO because bots get fully rendered HTML on the initial request in addition the data will always be fresh because you're making a new request to the server each time but the drawback here is that it's generally less efficient. You might be fetching and rendering the same Html over and over again. It is possible to do server-side caching but that's not as efficient as edge caching on a CDN and will cost a lot more to operate at scale. If things aren't cached efficiently that means a slower first time to meaningful content which can negatively impact SEO.

So basically between these three methods, we have a trade-off between data freshness performance and client-side interactivity. But what if there is a way we could have our cake and eat it too. Allow me to introduce you to incremental static regeneration. This is a new form of rendering available in the Next js framework. Remember earlier I said the drawback with static pages is that the data may become stale and require a re-deploy of your site. What ISR does is allow you to statically generate your pages and then rebuild and redeploy them on the fly in the background as new requests come to your site. That means you get all the performance benefits of static pages while ensuring that these pages always contain fresh data that eliminates all the trade-offs that we've talked about.

However, it's not without a cost deploying a static site is as easy as uploading your files to a storage bucket but incremental static regeneration would require a more complex back-end server deployment. For most of us, that means paying for a host that supports it like Versel but hosting anywhere else will likely be much more painful until more companies start adopting these techniques.

One very cool thing going on in the web development world right now is that more frameworks like next and angular are supporting hybrid rendering which means you can implement some routes as static pages configure other routes to use full server-side rendering while other routes can be fully client rendered so you're not pigeon-holed into just one rendering technique you can pick and choose what works best for a given page and in my opinion that's the future of full-stack web development.