My A.I. language knows 149,178 words

Springfield computer hobbyist claims an artificial intelligence breakthrough

Thomas Yale uses a rule-based approach to natural language processing, a system once popular but discarded by most A.I. developers in the late 1980s.
I’m somewhat of an anomaly in Springfield, or anywhere else for that matter.

For most of my life, I’ve lived modestly, worked temporary jobs at minimum wage, ridden a bike for transportation, even during winter, and eaten at St. John’s Breadline to save on groceries. Having earned an associates degree in computer science, I’ve had minimal success in getting hired for software development. It’s not unheard of that people don’t get hired in the areas for which they’ve devoted their lives to study.

Yet for three decades, I’ve pursued an intense hobby in a field normally reserved for Ph.D.s.

I work in the area of natural language processing, or NLP, a branch of artificial intelligence. Even if they don’t know what it’s called, NLP is usually what people think of when they think of AI. Since civilization depends upon communicating through language, it’s only natural for technology to attempt to simulate – as closely as possible – human comprehension of language. Humans hardly have to think when speaking, typing or writing. From childhood, this ability is hard-wired into our brains. But the process itself, encoding that ability in a computer, is a horrendously difficult task. That’s why it’s referred to as “hard AI.” If this problem was sufficiently solved, many useful applications could be built using that core technology.

But what makes what I do different from what other big-name companies like Google do? Plenty, actually. A little historical background is necessary to understand the difference.

From the 1950s until the late 1980s, all NLP systems were made up of small sets of handcrafted rules to analyze text, based on a few hundred examples. Developers meant to devise grammars broad enough to parse any kind of conceivable sentence, but they fell short of their expectations. They surmised that while a human’s grammar is largely modular, the interaction of components led the system to be unimaginably complex. Developers accepted this hypothesis and looked for other approaches, eventually settling on systems that analyzed text by conducting statistical analysis of corpora – large, annotated sets of English text derived from books, newspapers, magazines, research papers and other online sources. The computer compares text to examples using the same words in the corpora. More recently came language models.

Still, these conventional approaches have problems, which explains why we don’t routinely interact with our computers using English with any real comprehension. Not only do such systems lack recognition of what words represent, they never ask for clarification as a human might. And they would have to ask, because how can computer programs be any better at resolving ambiguity than the human beings who built them? They mindlessly follow those statistical rules, merely making their best guess about interpreting text.

In her groundbreaking paper, Timnit Gebru, a widely respected leader in AI ethics research, highlighted the risks of language models: environmental and financial costs, as well as input data encoding racist and sexist connotations. The latter is one reason for the rampant misinformation so common today, as evidenced by the turmoil of our political elections and propaganda, conspiracy theories and the COVID pandemic. Gebru herself wonders whether a cohesive language model can ever exist.

NLP has made inroads for useful applications, but this has largely been possible by narrowing the scope of the applications – minimizing vocabularies and limiting sentence structures they use.

My approach, however, runs against the grain, and has no such limitations. I revisited the rule-based approach and overcame its shortcomings, using resources that didn’t exist back in the 1980s. The advantages seemed obvious: a rule-based system presents not only a more plausible cognitive model necessary for tasks beyond simulating language comprehension, but also comprehensive control of the analysis to avoid the large and impactful mistakes Gebru describes.

After three decades, through strict, arduous experimentation and testing – mostly by myself, but with generous help from others along the way – I’ve developed a whole new NLP technology from the ground up, with results at least comparable to systems in use today. It has a 149,178-word vocabulary and 1,457 grammar rules, comparable to a college graduate’s fluency in English, with the ability to add more.

This is my life’s work: to build a means to help us to become better, more rational thinkers and to improve the quality of our decisions, which determine our destinies and upon which all our lives depend. Given my lack of academic credentials or business acumen, I may face challenges to validate it under peer review, or collaborate with businesses to develop practical applications with this core technology. Whatever the outcome, it won’t stop me from pursuing a goal which I passionately believe will help make the world a better place.

Thomas Yale is a temporary clerical worker at Manpower. He can be reached at [email protected] or 929-650-6993.

Illinois Times has provided readers with independent journalism for almost 50 years, from news and politics to arts and culture.

Your support will help cover the costs of editorial content published each week. Without local news organizations, we would be less informed about the issues that affect our community..

Click here to show your support for community journalism.

Got something to say?

Send a letter to the editor and we'll publish your feedback in print!

Comments (0)
Add a Comment