Internationalizing myself
One of the many reasons I've just spent 3 years in Nicaragua was that I wanted to see how the computer technologies that mostly originated in the USA were making it out to the rest of the world. Furthermore, I wanted to see how other cultures thought and operated, to see what they had to offer a citizen of the USA in return.
I'm always interested in improving communication in our ever more connected world.
I lost a few mis-conceptions along the way, but that's not what I'm going to write about today...
If you're tired of the technical detail I've been posting lately, you can skip down to the part of
this blog entry that describes the personal incident that drives me to work on internationalization issues. Hopefully you will understand, and laugh with me, rather than at me.
Moving on...
People can talk a lot faster than they can type or text. Even a trained, professional, transcriptionist, equipped with special hardware and software, can barely keep up with someone talking at a normal speed, and dictation software is rarely up to the job, either.
But writing things down is one of the best ways to ensure that what you've said is clear - spoken words are often mis-remembered or mis-interpreted, so when it comes down to trying to communicate across professions or cultures or languages- writing stuff down is best.
I've just spent 3 years in the jungle, in part, trying to figure out how to make that level of communication better.
Coping with internationalization problems
A quick digression: In Spanish (Español) the letter "i" is pronounced like "e" - Is that an artifact of the
great vowel shift? Was there some sort of transcription error when they started writing things down? Or what? (amusingly, I am not getting the ñ right on the web page, it looks like, right now)
Anyway. There's an accent on nearly every word you type in Spanish. This plays merry hell with the speed and quality of user input.
Text input device problems
I have a Spanish-specific keyboard, or so it claims to be. It has an extra key dedicated to the ñ key, next to the enter key, making hitting enter extra hard... And it has no easy way of entering the other accented vowels.
Under Xwindows (Linux/Unix), at least, there is a standard keyboard layout that solves the accent problem thoroughly and efficiently for most Indo-European languages, the "USA International (AltGr dead keys)" Keyboard model. It's one of thousands of keyboard models you can choose when installing a new machine.
By remapping the right alt key to being the accent key, typing in Spanish is made really easy - typing áéíóúñ is a matter of holding down the right-alt key and the letter. Capitals, right-alt+shift+the letter. ¿ is right-alt-?, and so on. I can make the euro sign (€) easily, too, but not the Cordoba...
Less common accent types (for other languages) exist nearby the relevant letter (on a querty keyboard) - the ä in my name, for example - is right-alt-q.
Easy, right? It was
years before I hit upon using this keyboard model. If it took ME years, how long would it take others? I would argue that most of the Spanish speakers I know - have no idea how to get a Spanish keyboard model they can use effectively, either.
There are two other, more common keyboard models on other operating systems. In one - You hit compose, the accent, and the letter you want to type.
Other systems use "dead keys" where you hit the accent (twice, if you want it to stand alone) and then the letter you want to accent. The "hit the accent key twice to get just the accent" keyboard model raises merry hell with people, like me, that program a lot and need to use the keys ~'" on a regular, standalone, basis.
Try as I might, I was unable to train myself to use either of these methods.
Worse...
What I see most people doing is grabbing for the mouse, then finding an option for "insert special character" in their word processor, finding the right character, clicking on it, and then continuing to the next word - an enormous productivity sump - and one that doesn't work in a browser or most tools outside of a word processing program.
All OSes do have the ability to enable alternate input methods - a little popup window that works with most applications to enter individual characters and/or remap the keyboard somewhat...
But most people don't know how to enable it. I was just in a cybercafe where someone HAD enabled it - for Cryllic - and I had the devils own time turning it off!
Others resort to having the spell checker fix it, or (the most common case) simply dropping the accents from daily use and depending on context to determine the differences between words. In a language where "Si" (if) differs from "Sí" (yes), this presents a gradually increasing contextual problem for all but the best bi-lingual speakers.
... At least three quarters of the Spanish speaking facebook and irc population I chat with just drop all the accents.
I sometimes resort to using google translate (I used to use babelfish) to do the job. I'll write a paragraph in English, translate it - fix it up for problems with gender (Amigo, o Amiga?) or tenses or formality (usted? Tú?) - and move on.
Coping with keyboard input cripples the pace of human communication outside of English.Using the "USA International (AltGr dead keys)" keyboard model, as I do now, makes typing in Español as fluid as it is to type in English. Does it exist on other OSes?
I'm not going to go into the horrors of typing Hebrew and Japanese today - or texting... Gah!
Coping with Textual representations
“With enough eyeballs, all bugs are shallow” - ESR
On this quote, note the use of “ and ”. While typographically correct, nobody, except publishers, actually uses proper quotes on the web, mostly (I think) because there is no easy way to enter those characters.
I can't tell you the number of times I've copied and pasted a bit of shell script from the web that used some other symbol that looked identical to ", but wasn't, and had to do a global search and replace for it so that the code would actually work.
Normal users have taken over the core of the languages we use to interact with computers - particularly notable with spaces and things like ';" in filenames. Free-form ascii usage in multicast DNS also bothers me a lot. "Joe's Machine", oy, vey!
These characters were reserved for programmers - in the 70s and 80s when Unix was developed and it's hard to cope with them being everywhere else, now.
It would be nice if there was a universal "programmers" character set that common tools like shell scripting languages, perl, C, C++ and Java could all use, instead of ASCII.
Apple's opening of the entire character set to end users in the 80s made
them happy, but wrecked decades of development in computer languages that reserved most of the special characters for programmers.
I know... "Tough. Deal with it. The Users have spoken." Nowadays, the methods for dealing with universal character sets are so fragmented and incomplete and domain or language specific that I often wish for a time machine to go back and forstall at least the more user-friendly character sets until we'd settled fully on one universal character set representation, be it unicode, or UTF-8 - neither of which were realized in final form for over a decade afterwards.
The famous "little johnny tables (" problem, html's use of the % convention for unrepresentable characters like %20 for space,
IDN's incredibly convoluted
Punycode representation scheme are all artifacts of the intense difficulty we still have in mapping human communication to computer based communication. Life was so much easier when programmers had all the extra characters to themselves...
but... given that we now have the *ability* to enter symbols like ×, ÷, ≠ and ∞... why don't we reserve some characters to programmers again??
What froze development of computer languages into the ascii character set? Did APL scar everyone that badly? Why does the language of math, in particular, have to remain so disjoint with the languages we use to program computers?
Why NOT a programmer specific unicode/UTF-8 character set?Just to amuse myself, maybe I'll try writing a ≠ function in clojure. I wonder how it will break?
Digressions: 1) And sublanguages have emerged, like the language of smileys. ♥? Where's the standardization committee for that? Or ♫?
2) Why don't we have automated tools that translate between english dialects? Translating from British English to American ought not to be hard: you could easily on the fly correct trivial re-spellings like theatre to theater, and common differences in phrasing like "Put something in the bin (trash can)" - and something that could automatically translate from male english to female english would be a real boon to the world!
3) Is it hard on other operating systems than Unix to have them use multiple languages at once on a per application basis? While working on
translating Ardour into Spanish, I ran the rest of my system in English, and merely set the LANG variable to es before firing up ardour. Being already familiar with ardour in English, having the Spanish version in front of me improved my sight recognition vocabulary enormously.
It made the pain of adopting Spanish a little smaller.
Partial Solutions I'd like to see
* Standardization on fewer, better, keyboard types and input methods
* Autodetection of locale on installation or upon logging into a network
* Translation bots for all the major chat protocols
* Spell checkers that can be set to multiple languages, simultaneously
* The ability to switch languages on a per application basis
Claude
Back in 1995 I'd started an email correspondence with a lovely girl based in Nice? Lille?, France - Claude Derieppe - who was studying English. (I'm weak on remembering how to spell her last name now)
Every couple of days for over six months we corresponded - I'd write long emails in English, she would send off short replies, also in English, full of enthusiasm and exclamation !!! Points. It was quite delightful to be making progress - having a bit of romance, even - with such a woman, so far away - from deep within my dark cubicle, while I slaved away at a 80/hr per week job at a startup. She was my lifeline to the rest of the "real" universe.
One day I wrote her a long and serious letter asking her about how she felt about increasing the intensity of our relationship, about religion, about having children, and about... Maybe... Um... Someday... coming for a visit?
... And I got it translated into French using some then-new translation software I'd just bought. I checked a few paragraphs against phrases in my French dictionary, and it seemed ok, so I emailed it off, without attaching the English original.
I know, you're laughing now. I can laugh, now, too...
Several weeks of silence ensued. She finally sent me an email back asking what was I doing talking about "Eating babies?" and numerous other mistranslations like that... She was mortally offended and stopped talking to me entirely soon afterwards.
I didn't know how bad the software was, and also didn't realize (until after that fateful message) that many email systems at the time stripped out all the accents on all the words I'd sent.
And I realized then that the differences in how humans conceptualize and represent language is one way that wars start and relationships disintegrate. Hell, the differences in how men and women use
English itself - supposedly a common language - are real and well documented.
The tools that we use to communicate via computers have come a long way since then, but still have a long way to go. I giggle insanely every time I see Star Trek's "universal translator" idea used in ways that simply could not work with any level of forseable technology.
I do continue to revel in cultural differences and wish more people would do the same. Take our attitudes towards "public property" vs "the queens land", as one example, or the "right of trespass and camping" so common in Norway, vs the plethora of land use waivers, rights away and eminent domain common elsewhere...
I later immortalized the experience with Claude in my wistful song, "Cybernation".
Everything is grist for my mental mill, and everytime I get stuck on solving some internationalization issue in software, I flash on an image of Claude....
And these days, 15 years later, I regularly correspond with people all over the globe in multiple languages, with the aid of google translate. I'm equipped, now, with a little more basic knowledge about how human languages work, but I'm sure I still make horrific mistakes.
I deeply admire citizens of Europe for their mastery of two or more languages, because it seems like I will never have more than a 3rd grader's vocabulary or grammar in Spanish, no matter how more study I put into the language.
Mastering just one language, Spanish, makes mastering Java look like child's play.
Labels: i18n, internationalization, keyboards, kvetch, spanish