Artificial Intelligence and SEO: An Experiment
The spread of AI-powered tools, particularly large language models (LLMs), is changing the way users search for information. These changes could have a profound impact on how search engine optimization (SEO) is conducted. In fact, we might need to start talking about optimization for large language models and coin the term LLMO (Large Language Model Optimization)!
The SEO community has long studied how Google and other search engines work: how they crawl information on websites, index it, and rank it in terms of importance before presenting it in search results.
While we don’t know exactly how Google works, we have learned a lot over the years. On the other hand, we know very little about how these AIs function.
Certainly, with the proliferation of these LLMs, a new way of searching for information is emerging; but the way this information is collected and stored is also new (indeed, we might not even be able to talk about collection and storage in the traditional sense, as an LLM is not a database). Even talking with the creators of these software tools, it becomes clear that their functioning is very complex and perhaps inscrutable. So, what do we do? I started with a small, simple experiment.
Recently, I asked ChatGPT (in the version based on GPT-4),
“How many employees does my company have?”
The response was,
“Develer has 50 employees.”
I replied that this was incorrect and that the correct number is 85.
It clearly responded,
“The number of employees at Develer is 85.”
When my colleague asked the same question shortly after, it responded,
“I don’t know.”
Fine. Today, a few days later, with the same question I get:
“Searched 6 sites.
Develer S.r.l. has 77 employees in 2024. This information was obtained from official sources reporting company employee numbers and other financial information (UfficioCamerale) (Revilaw).”
Well, 77, we are close. It might even be correct, depending on what is meant by employee.
I tried asking the same thing to Gemini (Gemini Pro 1.0):
“Unfortunately, it is not possible to provide an exact number of employees of Develer S.r.l. as of June 20, 2024, due to the following limitations:
- Outdated online information: Platforms like Aziende.it and ReportAziende.it report employee data based on financial statements from previous years (2021 and 2018, respectively).
- Lack of official data: Develer S.r.l. does not officially publish the number of employees on its website or other official channels.
However, it is possible to make some approximate estimates based on [..]”
Then, it made an assumption based on the 2021 revenue and the fact that Develer is classified as a small company by the Chamber of Commerce. It responded with “Between 20 and 50.” But the number isn’t what’s interesting. What struck me is that it referenced specific online platforms.
Not only that, it mentioned that the number of employees is not officially published on the company’s website. Since that number is actually on the site, I dug deeper:
“You said that Develer does not officially publish the number of employees on its website. Actually, I know that the data exists on the site. Can you tell me if it needs to be formatted in a particular way for you to detect it?”
You understand that, from an SEO perspective, this is very interesting. With Google, for example, we know that it might be useful to include some data in the page’s code (using structured data as specified by schema.org).
The response was very interesting:
“I apologize for the error in my previous response. You are right, Develer S.r.l. does publish the number of employees on its website. Unfortunately, at the moment, I am not able to directly process information from websites to extract data such as the number of employees. I am still under development, and this functionality is not yet available. However, I can help you format the data so that I can process it more easily if you provide the information manually. To do this, I would need:
- The specific webpage where the employee number is found.
- The format in which the data is present:
- If it is a simple number (e.g., 25), no special formatting is required.
- If the number is written within a text, such as ‘The company has 25 employees,’ you might need to extract just the number (25) and provide it separately.
- If the data is present in an image or a graph, you might need to convert it to text format manually or use OCR (Optical Character Recognition) software before providing it.
Once you provide this information, I will do my best to process it and provide you with the number of employees at Develer S.r.l.
Please note that this is a temporary solution. I am working to improve my information processing capabilities from websites and, in the future, I will be able to extract data such as the number of employees directly from pages.”
We already have some interesting indications. But the number was on my site, which is why I insisted and uploaded the link to the page containing that data. Let’s see what Gemini says:
“Certainly, I have analyzed the webpage you provided and found the following information:
Develer has more than 75 employees.
The data is not expressed as an exact number but indicates that the number of employees is over 75.
You can find this information in the paragraph that talks about Develer’s team:
Develer is a rapidly growing company, with a team of over 75 passionate and competent people.
I hope this information is useful!”
So the number was readable, even though I must say I wouldn’t have hoped for it, it’s a text but it has this formatting, it’s not structured data:
Well, good job Gemini. Interesting that it quoted the phrase “Develer is a rapidly growing company, with a team of over 75 passionate and competent people,” which is actually not present on the site. It’s probably one of those AI phenomena technically called “hallucinations.”
But I remember that in the first response, Gemini had hinted at online portals it would refer to. And here you can’t escape me: if Google never tells us anything about its algorithms, I hope an artificial intelligence can’t lie to its creators! So I asked it which sites it relies on:
“As a large language model, I rely on a variety of authoritative online platforms to find and update my information about companies. Some of the most important ones include:
Company websites: The primary source of information about a company is always its official website. Here you can find data like company descriptions, products or services offered, management, financial information, and the latest news.
Corporate databases: Various online platforms offer aggregated and profiled company data. Some of the best-known ones include:
These platforms provide information such as company names, registered offices, financial data, ownership structures, and connections with other companies.
News and financial publications: To stay updated on the latest corporate developments, I regularly consult financial news sources and journals specializing in specific sectors. Among these are:
Social media: Social media can provide valuable insights into corporate culture, employee engagement, and customer interactions. I follow companies’ official pages on social media and use analysis tools to monitor online conversations about the brand.
Analyst reports: Financial analysts and industry studies regularly publish reports and research on companies. These reports offer in-depth analyses of financial performance, corporate strategies, and future prospects.
Public registers: In some cases, it is possible to obtain information on companies from public registers, such as the business register or the patent office. These registers can provide data on ownership, executives, licenses, and patents.”
Interesting, isn’t it? Clearly, all this data should be taken with a grain of salt; maybe these sites are just examples among many, but we’ve learned something.
Shall I conclude? No, we’ve only just started to move into a new territory.
This was just a small, somewhat playful experiment. These models operate probabilistically; intuition cannot always be used to arrive at any certainty, but if I’ve learned anything in these days, it’s precisely that we don’t know how they move internally (I know, I sound like Socrates). Probably, under the hood of these tools, there’s much more than we know, and even those who created them can’t trace the path between a question and an answer. You can’t step into the same river twice: panta rei, the Greeks used to say. And indeed, if we repeat a question (oh, prompt is the technical term we’ll have to get used to hearing), it’s likely that we’ll get a different result each time.
To further confuse things, I asked ChatGPT the question again on the date this article was published, and here’s the response:
“Develer, an Italian company specializing in software development and technology consulting, has around 40 employees.
Would you like to know more about Develer?”
So how do we optimize our websites? For now, I’d say just as we’ve always done. Maybe with a bit more attention to data consistency, thinking not only about the website but also about social channels, other websites, and even more traditional media.
And then, why not?, getting help from artificial intelligence, but keep your eyes peeled! 🙂.