The growth of generative AI content has been rapid, and will continue to gain momentum as more web managers and publishers look to maximize optimization, and streamline productivity, via advanced digital tools.
But what happens when AI content overtakes human input? What becomes of the internet when everything is just a copy of a copy of a digital likeness of actual human output?
That’s the question many are now asking, as social platforms look to raise walls around their datasets, leaving AI start-ups scrambling for new inputs for their LLMs.
X (formerly Twitter) for example has boosted the price of its API access, in order to restrict AI platforms from using X posts, as it develops its own “Grok” model based on the same. Meta has long limited API access, more so since the Cambridge Analytica disaster, and it’s also touting its unmatched data pool to fuel its Llama LLM.
Google recently made a deal with Reddit to incorporate its data into its Gemini AI systems, and that’s another avenue you can expect to see more of, as social platforms that aren’t looking to build their own AI models seek new avenues for revenue through their insights.
The Wall Street Journal reported today that OpenAI considered training its GPT-5 model on publicly available YouTube transcripts, amid concerns that the demand for valuable training data will outstrip supply within two years.
It’s a significant problem, because while the new raft of AI tools are able to pump out human-like text, on virtually any topic, it’s not “intelligence” as such just yet. The current AI models use machine logic, and derivative assumption to place one word after another in sequence, based on human-created examples in their database. But these systems can’t think for themselves, and they don’t have any awareness of what the data they’re outputting means. It’s advanced math, in text and visual form, defined by a systematic logic.
Which means that LLMs, and the AI tools built on them, at present at least, are not a replacement for human intelligence.
That, of course, is the promise of “artificial general intelligence” (AGI), systems that can replicate the way that humans think, and come up with their own logic and reasoning to achieve defined tasks. Some suggest that this is not too from being a reality, but again, the systems that we can currently access are not anywhere close to what AGI could theoretically achieve.
That’s also where many of the AI doomers are raising concerns, that once we do achieve a system that replicates a human brain, we could render ourselves obsolete, with a new, tech intelligence set to take over and become the dominant species on the earth.
But most AI academics don’t believe that we’re close to that next breakthrough, despite what we’re seeing in the current wave of AI hype.
Meta’s Chief AI scientist Yann LeCun discussed this notion recently on the Lex Friedman podcast, noting that we’re not yet close to AGI for a number of reasons:
“The first is that there is a number of characteristics of intelligent behavior. For example, the capacity to understand the world, understand the physical world, the ability to remember and retrieve things, persistent memory, the ability to reason and the ability to plan. Those are four essential characteristic of intelligent systems or entities, humans, animals. LLMs can do none of those, or they can only do them in a very primitive way.”
LeCun says that the amount of data that humans intake is far beyond the limits of LLMs, which are reliant on human insights derived from the internet.
“We see a lot more information than we glean from language, and despite our intuition, most of what we learn and most of our knowledge is through our observation and interaction with the real world, not through language.”
In other words, its interactive capacity that’s the real key to learning, not replicating language. LLMs, in this sense, are advanced parrots, able to repeat what we’ve said back to us. But there’s no “brain” that can understand all the various human considerations behind that language.
With this in mind, it’s a misnomer, in some ways, to even call these tools “intelligence”, and likely one of the contributors to the aforementioned AI conspiracies. The current tools require data on how we interact, in order to replicate it, but there’s no adaptive logic that understands what we mean when we pose questions to them.
It’s doubtful that the current systems are even a step towards AGI in this respect, but more of a side note in broader development, but again, the key challenge that they now face is that as more web content gets churned through these systems, the actual outputs that we’re seeing are becoming less human, which looks set to be a key shift moving forward.
Social platforms are making it easier and easier to augment your personality and insight with AI outputs, using advanced plagiarism to present yourself as something you’re not.
Is that the future we want? Is that really an advance?
In some ways, these systems will drive significant progress in discovery and process, but the side effect of systematic creation is that the color is being washed out of digital interaction, and we could potentially be left worse off as a result.
In essence, what we’re likely to see is a dilution of human interaction, to the point where we’ll need to question everything. Which will push more people away from public posting, and further into enclosed, private chats, where you know and trust the other participants.
In other words, the race to incorporate what’s currently being described as “AI” could end up being a net negative, and could see the “social” part of “social media” undermined entirely.
Which will leave less and less human input for LLMs over time, and erode the very foundation of such systems.