Parsing Life: 2012

Montag, 26. November 2012

An XML configuration file frakkup

Only recently, I came across an XML configuration file for a web app that looked something like this:

<environment name="global">
...
</environment>
<environment name="production" extends="global">
...
  <database>
  <host>IP1.IP1.IP1.IP1</host>
<user>...</user>
<password>...</password>
</environment>
<environment name="staging" extends="production">
...
  <database>
  <host>IP2.IP2.IP2.IP2</host>
<user>...</user>
<password>...</password>
</database>
</environment>
<environment name="development" extends="production">
<database>
<host>IP3.IP3.IP3.IP3</host>
<user>...</user>
<password>...</password>
</database>
</environment>

I'm sure that, when they started things that way, there seemed to be good reasons for it. Still, I maintain that there are two Really Bad Ideas(TM) in there, and they are glaringly obvious:

1. Environments inherit from each other. This is just wrong - different environments may inherit from a common base, but they do not depend on each other in any way. Development machines are, by definition, different from the production machine. Remember, the reason we use inheritance is NOT to avoid code duplication, but to model realistic relationships between objects and classes of object in code.

2. The database host IPs (along with other addresses not mentioned here) are hardcoded in every single environment.

At one point, they had to change the database entry of a staging environment to point at the production database for some reason. Then change it back. Then change it to a completely new address...

Add to that the fact that there were really two different staging environments. Add to that the fact that they had to do this in several git branches, with tiny but important differences. Under fierce time pressure, of course, and with one new guy on board who hadn't quite grokked the whole jazz yet. Imagine the chaos that necessarily ensued. (It gets even funnier once you know that the addresses often differed in only one digit, so one had to look twice and remember which one was correct.) Imagine how often this went wrong. Imagine the understandable frustration on part of the developers, the CTO, and the customer.

The solution is so obvious that it's almost embarrassing to spell it out: Put the database entries into their own entities, and untangle the environments. Might take 30 minutes to do so, and will spare them tremendous suffering in the future.

<database name="productionDb">
<host>IP1.IP1.IP1.IP1</host>
  <user>...</user>
<password>...</password>
</database>
<database name="stagingDb">
<host>IP2.IP2.IP2.IP2</host>
  <user>...</user>
<password>...</password>
</database>
<database name="developmentDb">
<host>IP3.IP3.IP3.IP3</host>
  <user>...</user>
<password>...</password>
</database>
<environment name="global">
...
</environment>
<environment name="productionEnv" extends="global">
<database references="productionDb" />
....
</environment>
<environment name="stagingEnv" extends="global">
<database references="stagingDb" />
....
</environment>
<environment name="developmentEnv" extends="global">
<database references="developmentDb" />
....
</environment>

Of course, it is perfectly possible that the XML specification doesn't allow for that, in which case... tough luck; it really just shifts the responsibility from whoever wrote the config, to whoever wrote the XML spec (I was unable to find it, sadly). In an XML describing a server configuration, you sure want to be able to reference every entity from every other entity. After all, it's basically just another way to describe objects and their relationships. And you surely wouldn't copy the database descriptor into every single object in any somewhat sane OOP environment!

The most interesting part, to me, is that this is not a major frakkup, not by any means. It's a detail. But those minor frakkups add up, they crawl on each other's shoulders and become major frakkups over time, and at some point, they can easily endanger the success of a project or a whole company.

And that, my friends, is just sad.

Donnerstag, 19. Juli 2012

If Vim Then Janus

If you're a vim user, then you should definitely check out janus (https://github.com/carlhuda/janus/). Installation is fully automated: Just do curl -Lo- https://bit.ly/janus-bootstrap | bash. Your old .vim/ and .vimrc will be backed up. If you want to add additional tweaks, you can put them into .vimrc.before and .vimrc.after, respectively.

What Janus will do for you:

Provide you with NerdTree, a file-explorer inside vim (\n).
Add automated syntax checking to your program sources upon :w.
Add Tagbar (\rt) for eayily access to the ctags in your source tree. (works with exuberant ctags only)
Add EasyMotion (\\<MOTION-COMMAND>), which is probably the most jawdropping vim plugins I ever found.

Give it a try. There is nothing like an antediluvian text editor that doubles as a full-blown IDE. Or, as some might argue, its own operating system.

Oh and also... since we're on the topic: A simple google search for "vim configuration" will save you a LOT of time in the future. It's incredible what those vimthusiasts have already come up with, so why reinvent the vil?

Sonntag, 15. Juli 2012

It Takes Two To Tango!

A friend just sent me the following OO brainteaser:

So what is the natural and clean design for an operation that represents the sale of a property? Say we have these objects: the buyer, the seller, the agent, the property and the contract. Which one of these would you prefer?

property.sell(buyer, seller, agent, contract)

seller.sell(property, buyer, agent, contract)

buyer.buy(property, seller, agent, contract)

agent.sell(property, buyer, seller, contract)

contract.sign(property, buyer, seller, agent)

My friend's suggestion was to create a Transaction object that abstracts the possession of property on the one hand and money on the other.

This is, I guess, a good idea. But I would like to point out another issue.

To some degree, it's a trick question: The way I see a sale, the agent really has nothing to do with it - it's actually three transactions, buyer/seller/contract, seller/agent and buyer/agent. I may be mistaken here, I've never sold property via an agent, but my impression is that the setup is obfuscating the problem that the creator of that brainteaser was aiming at. Let's just cut out the middleman, and reduce it to buyer/seller/property, which is probably the most frequent use case.

So, our options are

property.sell(buyer, seller)
property.buy(buyer, seller)
property.sellAndBuy(buyer, seller)
seller.sell(buyer, property)
buyer.buy(seller, property)

That's still an awful lot of equally viable options - and all of these have been seen in practice quite a few times! (Okay, the sellAndBuy option is more of a joke, but it's actually an apt description of what's going on, if you just find a more abstract word to replace the awkward "sellAndBuy".)

I think that the case in question is unsolvable, especially given the options presented. My preferred solution would depend on the context of the overall application. And that is a clear sign that something is afoul here. It's a code smell, albeit one that doesn't point to bad code, bad to a limited language paradigm.

I thin that my friend's solution is a very good one - abstract the whole thing into aan independent entity to avoid the confusion. But it still makes the nails of my feet roll up in displeasure.

First off, I think that there are actually two seperate problems hidden in the task above.

1. OOP is actually SOP

I've maintained for quite a while now that OOP is a misnomer. It should more aptly be called "SOP" - "Subject Oriented Programming". Call me a grammar nazi - I'm a literature major, after all. I still think I'm right.

See, in the real world, we deal with subjects and objects. Subjects interact with subjects, subjects act on objects, but objects do nothing themselves. In OOP, all you really have is subjects - entities that interact with other entities. (You could model an actual object as a class with only public members - a struct in C++ - , but that's just ridiculous.)

seller.sell(buyer, property) is really just a formalized way to say

"a seller sells a property to a buyer". In abstract terms, a subject conspires with another subject to do something to an object.

But in OOP, since we cannot distinguish between subjects and objects, a solution that has the contract as the primary object that drives the whole business (i.e., the subject) is perfectly acceptable. Which isn't bad per se, but it's a rather unintuitive solution. So I'd probably rule that one out.

That still leaves us with

seller.sell(buyer, property)
buyer.buy(seller, property)

2. Missing reciprocity

Now we arrive at the real beef: There is no concept of reciprocity in OOP. You have to bind an operation to one class, and one class only. But in reality, interactions are a reciprocal act between subjects: Buying always necessitates selling. It takes two to tango!

The awkward situation gets very obvious when you consider overloaded operators in C++ (in the case of commutative operators): seeing "a + b" as equivalent to "a.operator+(b)" does not make an awful lot of sense. The two objects are perfectly interchangable. But you cannot easily model that in current OOP languages. (At least the ones I know.)

(Okay, admittedly it's not exactly the same problem, since we're dealing with objects of the same class here, but it still illustrates our issue.)

3. A humble suggestion

What we need is a syntactic expression for the fact that selling is the reciprocal action of buying: It should be possible to express that seller.sell acts on both the buyer and the seller, and you can use whatever is more obvious in the context of your code.

I'm thinking of something like

Reciprocal BusinessTransaction {
void sell(buyer, seller, property)
== buyer.buy(seller, property)
&& seller.sell(buyer, property)
&& property.sell(buyer, seller);
}

That way, every object can still mind their own business, and the coder could use either one of four semantically identical operations, depending on which perspective fits her code best. Triggering buyer.buy() would automatically run seller.sell().

Of course, this necessitates that the methods in the various classes are actually independent of each other and can be handled in every possible order without causing trouble. (Looking at what I wrote here, I wonder - why does it look so damn functional-y? Hmmmmmm. Must be something to it... something sinister...)

One issue yet to solve would be that it should be perfectly obvious, when you look at the code for buyer.buy(), that seller.sell() gets triggered in the background. I have no idea how to guarantee that without creating duplicate code. So, maybe, class BusinessTransaction is still the best solution. Sigh. Or putting the code into BusinessTransaction, but labeling buyer.buy() as an alias to that, some kind of shortcut or implicit delegate.

Or so I hope.

Montag, 18. Juni 2012

Announcing the Password Stupidity Contest

I am officially starting the

Worst possible password policy ever

contest.

Simultaneously, I am adding car2go.com as our first contestant.

Before we start, let me add that I totally adore their service. Car sharing is a brilliant idea, it helps me keep independent of public transport while not having to maintain and pay for my own car, and it is environmentally sane. Plus, it is a huge testament to the awesome possibilities that the internet has to offer.

The restrictions, as per their website, are the following:

Passwords must...

be between 8 and 25 chars long
start with a letter
contain at least one capital and one lowercase letter
contain at least one numerical digit
not contain combinations of the user's first/last names or the company name ('car2go')

Seriously. I am not bullshitting you. That's their rules.

Okay, I'll admit that the last one actually makes some kind of sense. The first part of the first one does so, too.

But the others?

The most glaring issue is the maximum limit on password length. The only explanation I can come up with is that they store cleartext passwords, which, considering that they take your bank account data, would be such a tremendous security issue as to make their whole service unusable.

But assuming that they do hash (and, depending on algorithm, salt) their passwords - why on earth would anyone limit them to 25 chars? To safe bandwidth? To avoid DOS attacks? There are much better ways to do that. It doesn't make any sense at all.

But more than that, I think that this points to a much much more general, ongoing issue.

See, we - we ITers, nerds, geeks, coders - have gotten used to, and have as a consequence gotten our users used to SHORT PASSWORDS. Or, to be more precise, to PASSWORDS. Instead of PASSPHRASES.

Now, I'm not a mathematician. But I'm fairly certain that the first line of your favorite Klingon haiku contains way more of that Sacred Spice... err.... I mean, that Sacred Entropy... than your cat's name in L331speak.

Entering a phrase is easy. Remembering a phrase is easy.

Why on earth did we go for passwords?

So we didn't have to add quotes when we passed it through pipes, perhaps?

I don't know. But it was a bad, bad, BAD decision. Everyone keeps trying to come up with weird combinations of uppercase, lowercase, digits and special chars, and of course then they forget whether it was 'p@r51ngL1f3' or 'pArS1ngL1f3' this time. And then they give up, and revert to using their spouse's birthdate. Duh.

This, folks, is madness.

But the real headscratcher, of course, is the rule to start a password with a letter. Did they imagine their passwords to be C identifiers? Do they not quote them? Do they use them as method names? WHY ON EARTH would a coder care whether the user's password starts with a letter or a digit?

It's a bit like thinking about the code in the PHP parser that distinguishes between a language construct and a function, so it can reject 'empty(foo())'. It's a bit like the Great Pit of Carkoon. You just don't want to go there.

So, if you happen to work at car2go, maybe you can send me a good explanation. I'm all open to revoking my points and praising your logic instead.

Dienstag, 29. Mai 2012

Psychoanalytic Apologetics

I happen to know a psychotherapist, Sabine Bösel, who works with the imago method. Recently, she had a public debate with local psychoanalyst Walter Hoffmann, which was printed in a national newspaper.

If you speak german, you can read the discussion here.

Mr Hoffmann, at one point, denounced the imago method as "infantile parroting" and claimed that "all people lie in therapy". That doesn't come across as very friendly and philanthropic to me, and it's certainly not a good way to get positive PR, but anyway.

He also said that imago can not work because "it contains no knowledge about the unconscious". He went on to say that Mrs Bösel "tends to the religious needs of the clients and their wishful thinking", and basically accused her of being a religious fraud. Again, not very friendly. No good mojo.

It is also somewhat ironic, coming from an Official of the Church of the Holy Trinity of Id, Ego, Superego, whose therapies last forever and cost all kinds of money.

But mostly, this is a severe case of religious apologetics.

See, there is no such thing as "the unconscious". If it were an actual physical entity, you'd be able to measure its diameter, and its related theory would be falsifiable. It isn't, and you're not.

"The unconscious" is really just a more-or-less useful metaphor, a chiffre for "things we don't quite understand, but need a short handy label for". The human mind is still a black-box to a large degree, and everything we say about it amounts to little more than a guessing game. Somewhat educated guesses, in the best case. In that regard, the unconscious behaves much like god, personality types, or qi.

And, also much like god or qi, this is perfectly okay in and of itself. We don't know everything, and sometimes it is useful to make assumptions. Everyone can check out a few of those theories for himorherself, gain a little experience here and there, find what suits them, and that's it.

Except it's not.

As I've probably already mentioned, and most assuredly will keep on repeating as long as apologists run wild on this planet, problems arise when people forget that their concepts are just concepts, and start treating them as actual things. People start investing emotions in those "things", and they get rather defensive when these concepts are challenged.

Such is the stuff that wars are made of. It's silly, and it's senseless, but that's how it is.

I'm currently reading a highly commendable book about medieval social history. Now, medieval doctors had no idea about germs and contagion. Instead, they assumed that illnesses were, among other things, the result of the four "bodily fluids" being out of balance. Of course, there was no empirical evidence for any of those fluids, and much less so for said balance being off. But they still went on treating people based on that wacky hypothesis. They didn't realize that it was just speculation. They took their own speculation for fact. And that's highly dangerous business. As in, people died of their cuppings and baths and other supposed remedies. Granted, at the time there was no useful alternative, so it didn't make that much difference anyway, but clinging less to those concepts might have sped up the process of finding one.

Now, Mr Hoffmann acted like a complete douchebag in that debate. That's deplorable, and I'm sure it is a reflection of his personal vulnerabilities and traumata - but it should not reflect negatively on psychoanalysis, just as Mrs Bösel's friendliness shouldn't reflect positively on imago therapy.

See, I'm not defending imago here. I have nothing invested in it. I've personally experienced it, and found it highly useful in a highly specific problem domain. But that doesn't mean that it's "true", or even that it's useful to just everyone. People may or may not respond to it. It is every bit as metaphysical as psychoanalysis, and I'm sure that therapists working with it are as much prone to falling for their own speculation. I'm sure there are advocates of imago who are every bit as apologetic and douchebaggy as Mr Hoffmann.

(In fact, they're probably not QUITE as much at risk, because analysis is much older and more established, and because it is firmly based on a much more intricate system of metaphysical concepts. Ever talked to an analyst? They seem to have an answer to, a name for everything, just like the Hare Krishnas, or Jehova's Witnesses. It's somewhat frightening, actually.)

So, instead of advocating one method over the other, I advocate taking your speculations for what they really are - more or less useful abbreviations for highly complex and mostly unexplained processes - and focusing much more on the practices of any method, than on its theory.

The other challenge, of course, is how potential clients are supposed to find out how metaphysically challenged their potential therapist is. And that, I admit, is an issue that I cannot answer. It would be interesting, at any rate, to get an answer to that question from the pros. Maybe Mr Hoffmann or Mrs Bösel is up to responding to that challenge?

Montag, 23. April 2012

Case Insensitive Filesystem FAIL

I made the mistake of CaMlCasing a package name in a Java project. It later occurred to me to be unwise, so I changed the name.

But of course, with the underlying filesystem on my MacBook being case insensitive, Eclipse was unable to change the directory names.

So I'm fairly certain that, while the project works nicely on my machine, it will fail on any Linux machine.

Blargh. Yeah sure sure, I'll just change it manually, but stuff like that just doesn't make life much easier.