Pink Lexical Goop: The Dark Side of Autocorrect


By Dmitry Mazin.

Illustrations by Sean McOmber.

We’ve awaited the age of artificial intelligence for decades. In our fantasies, AI is usually humanoid, straight out of the Jetsons. But while we anticipate the great arrival of the robotic butlers, AI has, in fact, already quietly permeated the fabric of our daily lives — from shopping, to driving, to communication.

Consider autocorrect, an AI-driven input assistant so ubiquitous that you likely don’t even realize how much it impacts your life. Without it, typing on a smartphone would be exceedingly difficult. That utility comes with a price, however, as autocorrect has begun to significantly alter the way we communicate.

Though you probably first encountered autocorrect as telltale squiggly red lines under your spelling mistakes, its breakthrough came with the smartphone. As you mash the tiny keys on your phone’s virtual keyboard, a sophisticated language model, working behind the scenes, determines which keys you actually intended to press. The iPhone, for example, invisibly enlarges those keys you are likely to hit next, so they are harder to miss[1]. Naturally, spelling is automatically checked in the process. This hybrid of input assistance and spellchecking is what we now know as autocorrect.

Prior to autocorrect, spellcheck was constrained to word processors. Its impact was limited, affecting primarily formal documents like letters and essays. Now, thanks to autocorrect, which mediates everything typed on a smartphone — casual and formal speech included — spellcheck is essentially universal. While the Standard English which spell check enforces may be preferable within the context of a formal document, this isn’t necessarily the case elsewhere.

Autocorrect’s insistence on “ducking” (instead of the much coarser exclamation) is infamous, but its rigidity goes beyond cursing. If you actually prefer the spelling “miniscule,” you must wrestle with autocorrect. And because actual humans adapt quickly to change (and even anticipate it), a human-edited dictionary like Merriam-Webster actually includes words that autocorrect doesn’t, such as “abridgement.”

Autocorrect fundamentally alters English. Since there are many ways to spell most English sounds, its spelling tends to drift. Autocorrect slows this evolution, enforcing Standard English in spaces where novel or informal spellings would have previously gone unmolested. Indeed, a 2011 study concluded that in a 20-year period prior to the introduction of autocorrect, spellcheck was already largely responsible for an accelerating death of English words, while the creation of new words contracted sharply, causing an actual shrinkage of the English lexicon[2].

Nevertheless, autocorrect undeniably provides a net benefit. Using our smartphones would simply be intractable without it. However, a new class of input assistant AIs operates on a level beyond spelling, affecting the very way we choose our words. These AIs cross into dangerous territory, threatening to render the English language into lexical pink slime.

In 2014, years after the iPhone’s initial release, typing on a smartphone apparently remained too slow[3]. With iOS 9, Apple launched a new product called QuickType, a small bar above the keyboard which automatically suggests the next word in a sentence, dramatically reducing the need for typing itself. “Typing as you know it might soon be a thing of the past,” Apple promised[4]. For simple phrases like “on my way,” QuickType works perfectly, and for more complex phrases, its suggestions are often good enough. Read more “Pink Lexical Goop: The Dark Side of Autocorrect”

Semi-autonomous software agents: A personal perspective.

by: The Doctor [412/724/301/703/415]

So, after going on for a good while about software agents you’re probably wondering why I have such an interest in them. I started experimenting with my own software agents in the fall of 1996 when I first started undergrad. When I went away to college I finally had an actual network connection for the first time in my life (where I grew up the only access I had was through dialup) and I wanted to abuse it. Not in the way that the rest of my classmates were but to do things I actually had an interest in. So, the first thing I did was set up my own e-mail server with Qmail and subscribed to a bunch of mailing lists because that’s where all of the action was at the time. I also rapidly developed a list of websites that I checked once or twice a day because they were often updated with articles that I found interesting. It was through those communication fora that I discovered the research papers on software agents that I mentioned in earlier posts in this series. I soon discovered that I’d bitten off more than I could chew, especially when some mailing lists went realtime (which is when everybody started replying to one another more or less the second they received a message) and I had to check my e-mail every hour or so to keep from running out of disk space. Rather than do the smart thing (unsubscribing from a few ‘lists) I decided to work smarter and not harder and see if I could use some of the programming languages I was playing with at the time to help. I’ve found over the years that it’s one thing to study a programming language academically, but to really learn one you need a toy project to learn the ins and outs. So, I wrote some software that would crawl my inbox, scan messages for certain keywords or phrases and move them into a folder so I’d see them immediately, and leave the rest for later. I wrote some shell scripts, and when those weren’t enough I wrote a few Perl scripts (say what you want about Perl, but it was designed first and foremost for efficiently chewing on data). Later, when that wasn’t enough I turned to C to implement some of the tasks I needed Leandra to carry out.

Due to the fact that Netscape Navigator was highly unreliable on my system for reasons I was never quite clear on (it used to throw bus errors all over the place) I wasn’t able to consistently keep up with my favorite websites at the time. While the idea of update feeds existed as far back as 1995 they didn’t actually exist until the publication of the RSS v0.9 specification in 1999, and ATOM didn’t exist until 2003, so I couldn’t just point a feed reader at them. So I wrote a bunch of scripts that used lynx -dump > ~/websites/`date ‘+%Y%m%d-%H:%M:%S’`.txt and diff to detect changes and tell me what sites to look at when I got back from class.

That was one of the prettier sequences of commands I had put together, too. This kept going on for quite a few years, and the legion of software agents I had running on my home machine grew out of control. As opportunities presented themselves, I upgraded Leandra as best I could, from an 80486 to an 80586 to P-III, with corresponding increases in RAM and disk space. As one does. Around 2005 I discovered the novel Accelerando by Charles Stross and decided to call this system of scripts and daemons my exocortex, after the network of software agents that the protagonist of the first story arc had much of his mind running on through the story. In early 2014 I got tired of maintaining all of the C code (which was, to be frank, terrible because they were my first C projects), shell scripts (lynx and wget+awk+sed+grep+cut+uniq+wc+…) and Perl scripts to keep them going as I upgraded my hardware and software and had to keep on top of site redesign after site redesign… so I started rewriting Exocortex as an object-oriented framework in Python, in part as another toy project and in part because I really do find myself enjoying Python as a thing that I do. I worked on this project for a couple of months and put my code on Github because I figured that eventually someone might find it helpful. When I had a first cut more or less stable I showed it to a friend at work, who immediately said “Hey, that works just like Huginn!”

I took one look at Huginn, realized that it did everything I wanted plus more by at least three orders of magnitude, and scrapped my codebase in favor of a Huginn install on one of my servers.

Porting my existing legion of software agents over to Huginn took about four hours… it used to take me between two and twelve hours to get a new agent up and running due to the amount of new code I had to write, even if I used an existing agent as a template. Just looking at the effort/payoff tradeoff there really wasn’t any competition. So, since that time I’ve been a dedicated Huginn user and hacker, and it’s been more or less seamlessly integrated into my day to day life as an extension of myself. I find it strange, somewhat alarming when I don’t hear from any of them (which usually means that something locked up, which still happens from time to time) but I’ve been working with the core development team to make it more stable. I find it’s a lot more stable than my old system was simply due to the fact that it’s an integrated framework, and not a constantly mutating patchwork of code in several languages sharing a back-end. Additionally, my original exocortex network was comprised of many very complex agents: One was designed to monitor Slashdot, another Github, another a particular IRC network; Huginn offers a large number of agents, each of which carries out a specific task (like providing an interface to an IMAP account or scraping a website). By customizing the configurations of instances of each agent type and wiring them together, you can build an agent network which can carry out very complex tasks which would otherwise require significant amounts of programming time. Read more “Semi-autonomous software agents: A personal perspective.”

Semi-autonomous agents: What are they, exactly?

by: The Doctor [412/724/301/703/415]

This post is intended to be the first in a series of long form articles (how many, I don’t yet know) on the topic of semi-autonomous software agents, a technology that I’ve been using fairly heavily for just shy of twenty years in my everyday life. My goals are to explain what they are, go over the history of agents as a technology, discuss how I started working with them between 1996e.v. and 2000e.v., and explain a little of what I do with them in my everyday life. I will also, near the end of the series, discuss some of the software systems and devices I use in the nebula of software agents that comprises what I now call my Exocortex (which is also the name of the project), make available some of the software agents which help to expand my spheres of influence in everyday life, and talk a little bit about how it’s changed me as a person and what it means to my identity.

So, what are semi-autonomous agents?

One working definition is that they are utility software that acts on behalf of a user or other piece of software to carry out useful tasks, farming out busywork that one would have to do oneself to free up time and energy for more interesting things. A simple example of this might be the pop-up toaster notification in an e-mail client alerting you that you have a new message from someone; if you don’t know what I mean play around with this page a little bit and it’ll demonstrate what a toaster notification is. Another possible working definition is that agents are software which observes a user-defined environment for changes which are then reported to a user or message queuing system. An example of this functionality might be Blogtrottr, which you plug the RSS feeds of one or more blogs into, and whenever a new post goes up you get an e-mail containing the article. Software agents may also be said to be utility software that observes a domain of the world and reports interesting things back to its user. A hypothetical software agent may scan the activity on one or more social networks for keywords which a statistically unusual number of users are posting and send alerts in response. I’ll go out on a limb a bit here and give a more fanciful example of what software agents can be compared to, the six robots from the Infocom game Suspended.

In the game, you the player are unable to act on your own because your body is locked in a cryogenic suspension tank, but the six robots (Auda, Iris, Poet, Sensa, Waldo, and Whiz) carry out orders given them, subject to their inherent limitations but are smart enough to figure out how to interpret those orders (Waldo, for example, doesn’t need to be told exactly how to pick up a microsurgical arm, he just knows how to do it). So, now that we have some working definitions of software agents, what are some of the characteristics that make them different from other kinds of software? For starters, agents run autonomously after they’re started up. In some ways you can compare them to daemons running on a UNIX system or Windows services, but instead of carrying out system level tasks (like sending and receiving e-mail) they carry out user level tasks. Agents may event in a reactive fashion in response to something they encounter (like an e-mail from a particular person) or in a proactive fashion (on a schedule, when certain thresholds are reached, or when they see something that fits a set of programmed parameters). Software agents may be adaptive to new operational conditions if they are designed that way. There are software agents which use statistical analysis to fine tune their operational parameters, sometimes in conjunction with feedback from their user, perhaps by turning down keywords or flagging certain things as false positives or false negatives. Highly sophisticated software agent systems may incorporate machine learning techniques to operate more effectively for their users over time, such as artificial neural networks, perceptrons, and Bayesian reasoning networks. Software agent networks may under some circumstances be considered to be implementations of machine learning systems because they can exhibit the functional architectures and behaviors of machine learning mechanisms. I wouldn’t make this characterization of every semi-autonomous agent system out there, though. Read more “Semi-autonomous agents: What are they, exactly?”