One of the many reasons I've just spent 3 years in Nicaragua was that I wanted to see how the computer technologies that mostly originated in the USA were making it out to the rest of the world. Furthermore, I wanted to see how other cultures thought and operated, to see what they had to offer a citizen of the USA in return.
I'm always interested in improving communication in our ever more connected world.
I lost a few mis-conceptions along the way, but that's not what I'm going to write about today...
If you're tired of the technical detail I've been posting lately, you can skip down to the part of this blog entry
that describes the personal incident that drives me to work on internationalization issues. Hopefully you will understand, and laugh with me, rather than at me.
People can talk a lot faster than they can type or text. Even a trained, professional, transcriptionist, equipped with special hardware and software, can barely keep up with someone talking at a normal speed, and dictation software is rarely up to the job, either.
But writing things down is one of the best ways to ensure that what you've said is clear - spoken words are often mis-remembered or mis-interpreted, so when it comes down to trying to communicate across professions or cultures or languages- writing stuff down is best.
I've just spent 3 years in the jungle, in part, trying to figure out how to make that level of communication better.
Coping with internationalization problems
A quick digression: In Spanish (Español) the letter "i" is pronounced like "e" - Is that an artifact of the great vowel shift
? Was there some sort of transcription error when they started writing things down? Or what? (amusingly, I am not getting the ñ right on the web page, it looks like, right now)
Anyway. There's an accent on nearly every word you type in Spanish. This plays merry hell with the speed and quality of user input.
Text input device problems
I have a Spanish-specific keyboard, or so it claims to be. It has an extra key dedicated to the ñ key, next to the enter key, making hitting enter extra hard... And it has no easy way of entering the other accented vowels.
Under Xwindows (Linux/Unix), at least, there is a standard keyboard layout that solves the accent problem thoroughly and efficiently for most Indo-European languages, the "USA International (AltGr dead keys)" Keyboard model. It's one of thousands of keyboard models you can choose when installing a new machine.
By remapping the right alt key to being the accent key, typing in Spanish is made really easy - typing áéíóúñ is a matter of holding down the right-alt key and the letter. Capitals, right-alt+shift+the letter. ¿ is right-alt-?, and so on. I can make the euro sign (€) easily, too, but not the Cordoba...
Less common accent types (for other languages) exist nearby the relevant letter (on a querty keyboard) - the ä in my name, for example - is right-alt-q.
Easy, right? It was years
before I hit upon using this keyboard model. If it took ME years, how long would it take others? I would argue that most of the Spanish speakers I know - have no idea how to get a Spanish keyboard model they can use effectively, either.
There are two other, more common keyboard models on other operating systems. In one - You hit compose, the accent, and the letter you want to type.
Other systems use "dead keys" where you hit the accent (twice, if you want it to stand alone) and then the letter you want to accent. The "hit the accent key twice to get just the accent" keyboard model raises merry hell with people, like me, that program a lot and need to use the keys ~'" on a regular, standalone, basis.
Try as I might, I was unable to train myself to use either of these methods.
What I see most people doing is grabbing for the mouse, then finding an option for "insert special character" in their word processor, finding the right character, clicking on it, and then continuing to the next word - an enormous productivity sump - and one that doesn't work in a browser or most tools outside of a word processing program.
All OSes do have the ability to enable alternate input methods - a little popup window that works with most applications to enter individual characters and/or remap the keyboard somewhat...
But most people don't know how to enable it. I was just in a cybercafe where someone HAD enabled it - for Cryllic - and I had the devils own time turning it off!
Others resort to having the spell checker fix it, or (the most common case) simply dropping the accents from daily use and depending on context to determine the differences between words. In a language where "Si" (if) differs from "Sí" (yes), this presents a gradually increasing contextual problem for all but the best bi-lingual speakers.
... At least three quarters of the Spanish speaking facebook and irc population I chat with just drop all the accents.
I sometimes resort to using google translate (I used to use babelfish) to do the job. I'll write a paragraph in English, translate it - fix it up for problems with gender (Amigo, o Amiga?) or tenses or formality (usted? Tú?) - and move on.
Coping with keyboard input cripples the pace of human communication outside of English.Using the "USA International (AltGr dead keys)" keyboard model, as I do now, makes typing in Español as fluid as it is to type in English. Does it exist on other OSes?
I'm not going to go into the horrors of typing Hebrew and Japanese today - or texting... Gah!
Coping with Textual representations
“With enough eyeballs, all bugs are shallow” - ESR
On this quote, note the use of “ and ”. While typographically correct, nobody, except publishers, actually uses proper quotes on the web, mostly (I think) because there is no easy way to enter those characters.
I can't tell you the number of times I've copied and pasted a bit of shell script from the web that used some other symbol that looked identical to ", but wasn't, and had to do a global search and replace for it so that the code would actually work.
Normal users have taken over the core of the languages we use to interact with computers - particularly notable with spaces and things like ';" in filenames. Free-form ascii usage in multicast DNS also bothers me a lot. "Joe's Machine", oy, vey!
These characters were reserved for programmers - in the 70s and 80s when Unix was developed and it's hard to cope with them being everywhere else, now.
It would be nice if there was a universal "programmers" character set that common tools like shell scripting languages, perl, C, C++ and Java could all use, instead of ASCII.
Apple's opening of the entire character set to end users in the 80s made them
happy, but wrecked decades of development in computer languages that reserved most of the special characters for programmers.
I know... "Tough. Deal with it. The Users have spoken." Nowadays, the methods for dealing with universal character sets are so fragmented and incomplete and domain or language specific that I often wish for a time machine to go back and forstall at least the more user-friendly character sets until we'd settled fully on one universal character set representation, be it unicode, or UTF-8 - neither of which were realized in final form for over a decade afterwards.
The famous "little johnny tables (" problem, html's use of the % convention for unrepresentable characters like %20 for space, IDN's
incredibly convoluted Punycode
representation scheme are all artifacts of the intense difficulty we still have in mapping human communication to computer based communication. Life was so much easier when programmers had all the extra characters to themselves...
but... given that we now have the *ability* to enter symbols like ×, ÷, ≠ and ∞... why don't we reserve some characters to programmers again??
What froze development of computer languages into the ascii character set? Did APL scar everyone that badly? Why does the language of math, in particular, have to remain so disjoint with the languages we use to program computers?Why NOT a programmer specific unicode/UTF-8 character set?
Just to amuse myself, maybe I'll try writing a ≠ function in clojure. I wonder how it will break?
Digressions: 1) And sublanguages have emerged, like the language of smileys. ♥? Where's the standardization committee for that? Or ♫?
2) Why don't we have automated tools that translate between english dialects? Translating from British English to American ought not to be hard: you could easily on the fly correct trivial re-spellings like theatre to theater, and common differences in phrasing like "Put something in the bin (trash can)" - and something that could automatically translate from male english to female english would be a real boon to the world!
3) Is it hard on other operating systems than Unix to have them use multiple languages at once on a per application basis? While working on translating Ardour into Spanish
, I ran the rest of my system in English, and merely set the LANG variable to es before firing up ardour. Being already familiar with ardour in English, having the Spanish version in front of me improved my sight recognition vocabulary enormously.
It made the pain of adopting Spanish a little smaller.
Partial Solutions I'd like to see
* Standardization on fewer, better, keyboard types and input methods
* Autodetection of locale on installation or upon logging into a network
* Translation bots for all the major chat protocols
* Spell checkers that can be set to multiple languages, simultaneously
* The ability to switch languages on a per application basis
Back in 1995 I'd started an email correspondence with a lovely girl based in Nice? Lille?, France - Claude Derieppe - who was studying English. (I'm weak on remembering how to spell her last name now)
Every couple of days for over six months we corresponded - I'd write long emails in English, she would send off short replies, also in English, full of enthusiasm and exclamation !!! Points. It was quite delightful to be making progress - having a bit of romance, even - with such a woman, so far away - from deep within my dark cubicle, while I slaved away at a 80/hr per week job at a startup. She was my lifeline to the rest of the "real" universe.
One day I wrote her a long and serious letter asking her about how she felt about increasing the intensity of our relationship, about religion, about having children, and about... Maybe... Um... Someday... coming for a visit?
... And I got it translated into French using some then-new translation software I'd just bought. I checked a few paragraphs against phrases in my French dictionary, and it seemed ok, so I emailed it off, without attaching the English original.
I know, you're laughing now. I can laugh, now, too...
Several weeks of silence ensued. She finally sent me an email back asking what was I doing talking about "Eating babies?" and numerous other mistranslations like that... She was mortally offended and stopped talking to me entirely soon afterwards.
I didn't know how bad the software was, and also didn't realize (until after that fateful message) that many email systems at the time stripped out all the accents on all the words I'd sent.
And I realized then that the differences in how humans conceptualize and represent language is one way that wars start and relationships disintegrate. Hell, the differences in how men and women use English
itself - supposedly a common language - are real and well documented.
The tools that we use to communicate via computers have come a long way since then, but still have a long way to go. I giggle insanely every time I see Star Trek's "universal translator" idea used in ways that simply could not work with any level of forseable technology.
I do continue to revel in cultural differences and wish more people would do the same. Take our attitudes towards "public property" vs "the queens land", as one example, or the "right of trespass and camping" so common in Norway, vs the plethora of land use waivers, rights away and eminent domain common elsewhere...
I later immortalized the experience with Claude in my wistful song, "Cybernation".
Everything is grist for my mental mill, and everytime I get stuck on solving some internationalization issue in software, I flash on an image of Claude....
And these days, 15 years later, I regularly correspond with people all over the globe in multiple languages, with the aid of google translate. I'm equipped, now, with a little more basic knowledge about how human languages work, but I'm sure I still make horrific mistakes.
I deeply admire citizens of Europe for their mastery of two or more languages, because it seems like I will never have more than a 3rd grader's vocabulary or grammar in Spanish, no matter how more study I put into the language.
Mastering just one language, Spanish, makes mastering Java look like child's play.
Labels: i18n, internationalization, keyboards, kvetch, spanish
The news from my navel
Over the past month, I've been posting my most interesting failures
from my personal backlog/stash.
I always try to write up a postmortem of my projects, whether they succeed, partially succeed, or fail, so one day I can learn from them. Recently I nearly lost this personal stash of depressing documentation (the hard disk got rained on) and I decided that it would be best to whip some into shape for the blog and get them out there, and try to learn from them, personally. Maybe others can too. It still hurts to talk about some of this stuff... but as Spider Robinson says: "Pain shared is decreased, joy shared, increased"
I'm also... in the middle of temporarily, most likely, permanently, halting two R&D projects that I had intended to spend 5 years working on, 3 years in. I'm a month into writing up the postmortems. There were plenty of successes, and more than a few failures. Given all the work I did, it will take me months more to finish writing up the descriptions of the projects and what went right and wrong... and I guess that it's easier to look at the other things I did in the past that didn't work out and finish writing THOSE up than it is to turn into readable text the reams of lab notes and documentation I currently have on what I've been doing for 3+ long, lonely years.
I tried to go into my last two projects with my eyes open, fully aware I was doing R&D and R&D, almost by definition, doesn't go the way you want it to.
But after a while, I got pretty emotionally invested in them and couldn't see the forest for the trees. It took an enormous kick in the ass for me to get out of the jungle (the survivor tv show rented the house I was living in out from under me) and gain enough distance from those projects to be able to see that I cannot continue at the present state of technology. I just spent 4 months traveling the US, trying to gain clarity, and find somewhere other than California, or Nicaragua, where I might live, and do something else, simpler, that I might succeed at, or leverage what I just did, notably with ipv6, in building out the rest of the Internet, elsewhere.
While in the States, I did some interesting consulting on a lawsuit
concerning patent 7035281
, filed in september 2000, which basically patents most of the features in a Linux based wireless router...
...for which my friend greg and I had prior art in 1998
. Portions of my blog are now in the court record, including this piece
, where I counseled Dave Cinege:
I'm no stranger to getting so wrapped up in a project that I confuse it with growing a child. I've done it multiple times before, and I'll probably do it multiple times again, until I actually get around to growing a child. It hurts to give up a project that isn't rewarding - but it's not a child - you CAN and SHOULD abandon it if it isn't working out, and fill up that empty space with something else.
As I talked about this - now 12 year old project - with Cisco's lawyers, and others, and reflected on all the changes in the world since then, and where we succeeded and failed, then, and what happened since, I kept thinking it was long past time to take my own advice.
The rainy season in Nicaragua is driving me nuts! I have no internet at my new (rented) home, 6km out of town, and the road I live on is frequently impassible, so I have plenty of time to write, think, and plan... still, I find getting to town, and on facebook, on occasion, is a comfort, given what I'm writing about and the size of the plans I'm trying to make.
I'm planning on either moving deeper in the jungle or to Colorado, in a few weeks. Mostly, I'm thinking, Colorado. I have friends and family there, I really enjoyed my visit there, it's a lovely place, the people are great, and there is at least some high-tech there that might need my skills.
It's been oddly comforting reviewing the four failures
I've written up so far, years - in one case, decades - after they happened. Themes have emerged - being too early, or underfunded, Paul Graham's truism that "the best people didn't work for me", and totally unanticipated and project killing problems with the toolchains and chips themselves - that were completely outside my control and range of expertise at the time.
Gaining wisdom comes hard.
Probably the best news from my navel was that doing the write-ups - spending the time to coherently write and publish, in English, about what we did - as we did with the wireless howto
- was probably the smartest response to success AND failure anyone could have come up with.
In most cases, eventually, the technology progressed until something like the original vision came into existence.
I will continue writing up the failures series - and I hope that others have the courage to do the same - but I'm going to take a break from these writeups for a while and try to pull together my plans for the future. Maybe I'll take the time to write up a few successes, too.
I've missed living in Civilization. Perhaps it missed me.