OOXML validating mailfiler

I take strolls with my roommate, Håvard, quite often. Today, we touched on the subject of ODF vs OOXML after a longer discussion of document validation in general. That's when we hit upon the idea of the OOXML-validating mailfilter.

The idea is quite simple: In addition to your anti-virus, spamfilter, greylisting, and other filtering functionality you have on your mailserver, you add a small script which relies on unzip and xsltproc to validate any OOXML document that passes your way, and automatically rejects any non-conforming document with a stern warning, such as:


It appears that the document <file name>, created with
<document producing program>, is corrupted. I will not be able to
read this document properly. Please consider resending a conforming
OOXML document, or use a different document format, such as ODF.

Pedantically yours,

<your name>

Organisasjoner jeg i øyeblikket tillater meg å være skeptisk til

<kynisk>

Jeg har fulgt OOXML-saken fra sidelinjen en stund, og det har ikke manglet på underholdning. Nå føler jeg tiden er inne for å noen eder og litt galle fra egen galleblære.

I følge digi.no sendte disse organisasjonene inn standardbrevet utdelt av Microsoft forut for avstemmingen om Norges standpunkt til ISO i OOXML-saken.

  • Flekkefjord kommune
  • Sametinget
  • Sykehuset Asker og Bærum HF
  • Ullevål Universitetssykehus
  • Avenir
  • Hands
  • IT Partner Harstad
  • Mamut
  • Oslo Datasenter
  • Umoe IKT
  • Webstep
  • Abile
  • Accenture
  • Active Templates
  • Avanade Norge
  • Bouvet
  • Ementor Norge
  • Fønix Data
  • Holte byggsafe
  • Inmeta
  • IT Partner Bodø
  • IT Partner Møre
  • IT Partner Tromsø
  • iTet
  • Itum
  • Jensen Consulting
  • Maritech
  • Masterminds
  • Norsk Data Senter
  • Objectware
  • Osiris Data
  • Ravenholm Computing
  • RB Datatjenester
  • Steria
  • SuperOffice
  • Systempartner Sør
  • Umoe Consulting

En del av disse er jo kjente Microsoft-partnere, men det både skuffer og overrasker meg at en del offentlige organisasjoner har kastet seg på OOXML-bølgen med tanke på hvor råtten og høyst tvilsom hele denne prosessen har vært.

Jeg er definitivt for at Microsoft åpner formatene sine og at de forsøker å få disse ISO-godkjent. Det jeg reagerer med sterk skepsis mot er måten de later til å kjøpe stemmer på i saken. Jeg vet jo at mine naive ønsker om en noenlunde rettferdig, demokratisk prosess i slike saker er fånyttes, men det betyr vel ikke at en trenger å leke korrupt bananrepublikk. (Det hadde nok vært både artig og utfordende å spille Junta mot en del av aktørene i denne saken.)

Det hjelper forsåvidt heller ikke på mitt humør at OOXML fra et teknisk ståsted er et søppelformat, fullt av meningsløse vorter. For eksempel dikterer standarden at år 1900 er et skuddår, de bruker gjennomgående ikke andre ISO standarder der dette ville være lurt (vektorgrafikk, matematikkformler, lokalisering, krypto, det metriske system), og har en elendig XML-stil (veldig inkonsistent, med flotte attributtnavn som outerShdw, ActiveWritingStyle, hdrShapeDefaults -- hvorfor endring i casing, og bruk av vokaler?). Og dessuten er det et monsterformat på 6000+ sider, uten en åpen referanseimplementasjon. Hurramegrundt.

Det største problemet med den foreslåtte standarden er at den er ekstremt kompleks, helt uten grunn. Det vil kreves massiv innsats for å skrive programmer som er bug-for-bug kompatible med standarden, uten at dette vil gagne sluttbrukeren på noe vis. Handsking med de stygge vortene er utelukkende nødvendig pga "gammel morro" som har akkumulert i Office-produktene oppigjennom. Makan til håpløst vrøvl har jeg bare sett i Netscape Mail sitt Mork filformat.

Basert på tidligere erfaring med å bruke "standardformat" fra Microsoft, er jeg bekymret for divergens. Ta for eksempel advanced streaming format, som (heldigvis!) aldri ble noen "endelig" RFC. Her tok det tok ikke særlig lang tid før formatet produsert av Microsoftprodukter ikke lenger kunne leses med programmet som implementerte (draft) RFCen korrekt. Vi trenger ikke nevne DNS, Kerberos, CHAP, PPP, og andre greie standarder Microsoft har utvidet med eget søppel med inkompatibilitet til følge. Jeg frykter at ISO-OOXML og MS-OOXML kommer til å divergere ganske mye, ganske kjapt. Da er det ikke lenge mye vits i ISO-OOXML, annet enn på salgsturné: "Jada, vi er en ISO-standard. Se her er stempelet."

</kynisk>


Wanted: Ultra lightweight CMS

Dear Lazyweb.

We are looking to replace Drupal with something more lightweight. The virtual server we host boblycat.org on is pretty cramped for space, so the price of a couple of apache instances running Drupal, plus MySQL sitting in the background, along with the rest of our services, is sometimes too much for the available RAM. The result: some not-so-surpising OOMs, and a script which prophylactically restarts Apache every now and again.

What I'm looking for is a solution that combines a gallery with a wiki with a blog. It must be multi-user, and each user must be able to have his own theme. LDAP integration would be nice for auth. These could be three different pieces of software, but proper gallery and blogging integration is a must for our current clientelle, which lives under the delusion that photoblogging is fun;P For the rest, integration between the wiki and blog is desirable.

If anybody knows of any gems hiding out there that fits this task, don't hesitate to get in touch.


Last days in Paris

I'm enjoying my last days in Paris. It's been a great time, apart from a week and a half of combating the local microflora. I've seen a lot, read a lot, hacked a lot, and met some fun and interesting new people.

Unfortunately, I forgot my camera back home, so the only snapshotting companion has been my trusty W810i. I'll see if I can't find a bit of time to post-process the snaps and put them in the gallery. I've been putting this off as there are no decent programs on Linux for doing simple photo editing, at least that I found. Blimp seems to fit the bill with respect to simple, but its user interface is rather cumbersome. F-spot is a mess that crashes all the time. The Gimp also has a totally useless interface. I guess I'm looking for something like iPhoto, only free and full of open-source goodness.

Also, having to use the web interface for uploading pictures to our gallery is extremely cumbersome. I suppose I should check if we can't get WebDAV working. But, then again, it would save me even more time if I just put all of this on Flickr (or similar)....


Proper navigation support for Spoofax

I finally figured out how to add proper navigation history support to Spoofax today. This one has been bugging me for quite some time. I remember spending far too much time diving through the documentation with the hopes of figuring out how this should be done properly. No luck.

Today I had a flash of inspiration, so I dug into the JDT code base. That code seemed to solve the same problem in a very complicated way, so I didn't want to copy their approach outright. Stymied, I started tracing exactly what happens with the navigation history when positions are placed into it. After a bit of fiddling around, I figured out that when I move the cursor, I should mark the position both before and after the cursor/focus moves to get the behaviour of JDT (which I tried to emulate). I've always only tried saving the editor location state either before I changed it, or afterwards. I also tried all kinds of alternative calls on the EditorPart hierarchy in vain. I now use ITextEditor.setHighlightRange() which appears to do the job, provided I call markInNavigatorHistory() "properly".

Anyway, the lesson is simple: if you call AbstractTextEditor.markInNavigationHistory(), remember to do it twice -- once before you change the editor/focus and once afterwards.

Porting Eclipse IMP from Eclipse 3.2 to 3.3

It's official: I'm the bootstrapper. My hacking life in the last few weeks have hardly been anything but bootsrapping. I've already said a few things about the Stratego compiler hacking. Since it takes ~3-4 hours for a full build of the Stratego compiler in the Delft buildfarm, I've had a couple of other projects to dive into in parallell. One of these has been the porting of Eclipse IMP from Eclipse 3.2 to 3.3.

In short, IMP is an IDE generator based on Eclipse. It provides set of plugins and wizards that makes the development of programming language environments (a lot) easier. The basic workflow when building an IDE for you favourite language with IMP is, (1) provide a grammar defined using the LPG grammar language, (2) use the IMP-provided wizards inside Eclipse to generate things like syntax highlighting support, outline support, code folding support, templates, text hovers, etc, then (3) fill in the skeletons provided by the generator. My personal view (subject to change without warning) of the generated code is that it's a guide to which parts of the Eclipse framework you need to extend in order to provide a given piece of functionality. Sort of a little helpful gnome pointing you in the right direction. In some cases, the generated code will actually do all you want, but more often than not, you will want to go beyond it.

That was the backgrounder on IMP. A major drawback of the current IMP releases is that they will only work on 3.2. Oh, and, of course, that IMP requires IMP to build IMP. Getting this beast ported to 3.3 wasn't as straightforward as I'd hoped. It took a few iterations. The first was getting it to build properly without any problems on my plain 3.2 installation. That took me several days. All kinds of subtle bugs surfaced, presumably because I have a different set of development habits than the IMPers.

Once those were patched and fixed upstream, I managed to bootstrap my first version on 3.2. An ensuing battle with race conditions in the startup code of various plug-ins followed. I hate static initializers, but apparently not everybody does. In a multi-plugin architecture where the order to plugin loading is not guaranteed, I cannot see how you can safely assume the order of static initializers across plugins, but those questions are not for me to ponder. I ripped them out, and replaced them with lazy initializers as far as possible, and that worked wonders. With that hurdle out of the way, it was all down hill: a couple of internal JFace and JDT classes had changed locations and APIs between 3.2 and 3.3, but it was quick enough to rewrite the offending code (another reason why depending on internal APIs is a bitch, though I realize that the features in question could not have been provided without doing so).

It's a huge disappointment to realize that my patches are only a couple of hundreds of lines. I felt like I had to rewrite the world, at places... Anyway, here's hoping to its inclusion in one of the pending releases. I've updated our sdf2imp tool to use the 3.3-based IMP, so we're already seeing a return on my investment:)


Stack tracing improvements

A limitation of my previous stack tracing patches was that io-wrap and io-stream-wrap did not properly report traces on failure. The reason for this is easy to spot if we look at how the error is handled (this is where execution flow ends up when you call io-wrap):

  option-wrap(opts, usage, about, announce, s) =
    parse-options(opts, usage, about)
    ; announce
    ; (s; report-success <+ report-failure)

  report-failure =
      report-run-time
    ; <fprintnl> (stderr(), [ (), ": rewriting failed"])
    ; <exit> 1

As you can imagine, even though the program now happily prints a stack trace when the main strategy exits with a failure, it will not be printed when exit is called.

I've introduced a couple of stack introspection functions for dealing with this: stacktrace-get-current-frame-name returns the name of the current frame s, stacktrace-get-all-frame-names returns a list of all frame names and, stacktrace-get-current-frame-index returns integer that holds the current depth of the stack. These are actually implemented by primitives in the Stratego Standard Library (SSL).

A caveat of these strategies is that calling them will of course alter the stack. Even in the wonderful world of computing, we're not entirely free of Heisenbergian effects, apparently. However, there's a simple workaround: call the primitives directly, since this bypasses the way the compiler registers the stack frames.

With this trick in hand, I rewrote the two above strategies to include proper stack tracing for io-wrap:

  option-wrap(opts, usage, about, announce, s) =
    parse-options(opts, usage, about)
    ; announce
    ; (s; report-success <+ prim("SSL_stacktrace_get_all_frame_names") ; report-failure)

  report-failure =
      ?stacktrace
    ; report-run-time
    ; <fprintnl> (stderr(), [ <whoami> (), ": rewriting failed, trace:"])
    ; <reverse ; map(<fprintnl> (stderr(), ["\t", <id>]))> stacktrace
    ; <exit> 1

Applying the modified io-wrap on the following sample program

 
  main = io-wrap(my-wrap(foo))

  my-wrap(s) = s

  foo = debug(!"foo") ; bar

  bar = debug(!"bar") ; fap ; zap

  fap = debug(!"fap") ; id

  zap = debug(!"zap") ; debug ; fail

gives

./prog: rewriting failed, trace:
        main_0_0
        io_wrap_1_0
        option_wrap_5_0
        lifted144
        input_1_0
        lifted145
        output_1_0
        lifted0
        my_wrap_1_0
        foo_0_0
        bar_0_0
        zap_0_0

Due to the compiler lifting inner strategies into freshly named, top-level strategies, the trace will contain some lifted* entries. Also, should you call strategies or rules which are compiled with older versions of the compiler, there will be "dark spots" in your trace. It won't be truncated -- only the frames due to the old library will be hidden.

Would you like a stack trace with your "rewriting failed"?

Prompted by my visit to EPITA, I hacked together some very basic support for stack traces in Stratego that might come in handy when a Stratego program fails.

Here's a simple Stratego program, called prog (which, if you look at it closely, will always fail):

  main = foo

  foo = bar

  bar = fap ; zap

  fap = id

  zap = fail

On the latest and greatest version of the compiler (build 17522 and later), you will get the following trace when this program is executed:

prog: rewriting failed, trace:
        main_0_0
        foo_0_0
        bar_0_0
        zap_0_0 

There are a number of caveats with the tracing that I will try to get rid of, and, when there are only very hard problems left, explain myself out of, in a couple of future posts.

€U vs Micro$oft

I noticed yesterday that EU fined Microsoft a decently-sized fine of 899 million EUR. About time!

It sounds like a lot of money, but net income for Microsoft for 2007 alone was $18.52 billion. Since the fine is supposed to cover transgressions committed over the time period 2004-2007 (inclusive), it amounts to a about a week's net income for each of those four years. That would, I suppose, be comparable to the price of a speeding ticket every year for a person with an average income. If you get away with driving significantly above the speed limit for all those four years, the price might very well be worth it....

I'm only bothering posting about this because while the company employs a lot of great people, who are both brilliant and friendly, their corporate attitude has been stifling and pretty problematic in many of the previous jobs I've had. If this is what it takes for them to play nice and civilized, I'm all for it. I'm happy that the EU has the guts to do something tangible, as opposed to what the US has managed in this case (though not for want of trying -- I think there have been a lot of good intentions behind the measures put in place after the anti-trust case overseas, as well.)

It's also very good to notice that additional changes are afoot lately, with the actual opening of various specs. In previous years, there's been quite a lot of promising, but, ultimately, totally worthless press releases about how great and open the future will be with respect to interoperability between Microsoft and the world.


Visit to EPITA

I visited Akim Demaille and his posse at EPITA today, and apparently there still is such as thing as free lunch (although, in my excitement over the good food, I kinda promised to help out with fixing some Stratego issues they are experiencing, so it was not entirely without entanglements).

I got to sit in on one of the bi-weekly status updates for the LRDE. The room numbered a little under 30 people, including students and faculty. They were kind enough to hold the meeting in English so that I could follow it. I found it surprising and very encouraging to have everybody report their progress (and, in a very few instances, lack thereof) in front of the entire lab. I've been missing this in many of the institutions I've been working at. It certainly increases the level of team feeling, and also makes it easier to uncover opportunities for collaboration between the various groups. For example, they all shared a lot of common infrastructure, including setups for newsgroups, a build farm, svn repos, etc.

I met two of the guys from the "previous" Transformers generation, Florian and Maxime. Florian was putting the finishing touches on a visualization tool for ambiguities in Transformers' attributed parse trees. It looked pretty sweet. Maxime was hacking a translator from a DSL for their Olena image processing library.

I also got to meet the new generation of Transformer students. I expect that I'll interact a lot more with them in the coming months, as they come to grips with Stratego.


Syndicate content