The Pragmatic Geographer

Conway's Law and The Ecology of Freedom

Melvin Conway invented the term coroutines and was heavily involved in the development of MUMPS, a fairly arcane programming language still used in the medical/health record industry. But perhaps his most famous contribution, at least in systems and software organization circles, is Conway's Law:

Organizations, who design systems, are constrained to produce designs which are copies of the communication structures of those organizations

It's hard to avoid seeing it when you become aware of it. Why does Facebook have those specific items in its side panel? Better than even odds it matches VP level subgroups within the larger organization.

One example I vividly recall was several years ago at a Seattle Python meetup. I asked an Uber employee to what extent they used the language: "Oh, half the engineering group actively writes in Python. But not the other half. It's from way back at the start of the company, when they outsourced the Marketplace" An early organizational decision effectively dictated technical ones (Seemingly confirmed by Uber's Blog here).

But is this just limited to organizations that design systems?

I was reminded of Conway's Law while I was reading The Dawn of Everything by David Graeber and David Wengrow. It's a fascinating work that throws a lot of assumptions about early human civilization into question, based on archeological evidence produced in the last few decades.

The chapter "The Ecology of Freedom" is about early farmers, and how our records of them suggest not a linear path of foraging -> cultivation -> farming -> states, but rather a dabbling with occasions of people abandoning farming or conciously resisting practicing it (and why not, foraging often gave you better bang for your buck).

The title of their chapter comes from a book by Murray Bookchin, and it's the footnote they have referencing his work that jumped out at me:

...we cannot follow his own ideas about human prehistory or the origins of agriculture, which are based on information that is now many decades out of date. We do, however, find much to learn from his basic insight: that human engagements with the biosphere are always strongly conditioned by the types of social relationships and social systems that people form among themselves.

(emphasis mine)

I found it fascinating that a correspondance between the social relationships of software/systems engineers and the products they produce would echo some of the earliest known activities of our species

Despite hundreds of years of technological innovation and an overarching economic system that demands efficiency at seemingly all costs, there is no escaping the fact human social interactions and communiction cannot be entirely Taylorized out of the story of how we build things, and what they look like when they are finished.

As deeply alianating and dehumanizing that system can be - Ted Chiang famously likens it to the doomsday AI of science fiction imagination, but already here and doing damage without any machine intelligence needed - that this aspect of the creator(s) shows up, even if only in internal systems, is heartening.

Crunch Mode Does Not Work

Killing the Crunch Mode Antipattern

If you want a “knowledge worker” to be as ineffective and produce the lowest level of quality possible, deprive them of their sleep and hold them to an unrealistic deadline….

It makes people lazy and less productive. This may seem ironic, but when someone puts in heroic levels of effort, they start to place less value on each minute.

The last crunch mode I experienced virtually none of the code could be salvaged. We did a reasonable review of the work and realized more than half would need to be significantly refactored.

It turns out that there has been some study around when a project should be rewritten vs. being subject to an extensive refactoring. It does depend, but generally the cut-off is about 20-25% – if you need to change more than that, you are likely better off just rewriting it (Thomas, Delis, Basili, 1997).

REST and File Uploads/Attachments

Your web application will support uploading files. At first glance, this is an action and you might consider working with it as an RPC endpoint rather than REST. The upload could refer to a verb rather than a noun.

There isn’t anything really wrong with this, but I would argue there are significant advantages with going with it as a noun (REST resource). Here are a few:

Staging an upload to external datastore

An upload may not be directly to you, and it might not be used by the requesting client – signed S3 forms, one-time URL endpoints, other protocols like Bittorrent, and other mechanisms that allow direct client uploads.

Example: You POST to create an upload resource.

{
    "upload_to_url": "https://example.com/one/time/endpoint/hashhashhash",
    "signed_token": "blahblahblah",
    "expires": "2013-07-12T19:10:19.491Z",
    "etc": "..."
}

...and it returns the upload resource you created. It's not the raw file, but rather useful metadata about it (including the actual raw file location).

Tracking/auditing – both internally and externally

What if a user wants to see what uploads are currently in progress? All of the successful ones? The failures? Those are all also useful metrics internally as well. But as above you have an upload resource now, so you can retrieve it with a GET:

{
    "createuser": "https://example.com/user/1234",
    "modifieduser": "https://example.com/user/1234",
    "createdate": "2013-07-12T19:10:19.491Z",
    "modifieduser": "2013-07-12T19:10:19.491Z"
}

Attaching additional resources as a means of post-upload action

The file being uploaded is unlikely to exist in a vacuum. You will have related resources and possibly related actions. You can stick that stuff on here too. Consider, for instance, that you want to send alerts to some people when the upload is complete:

{
    "subscribers": [
      "https://example.com/user/1234",
      "https://example.com/user/288",
      "https://example.com/user/3"
    ],
    "etc": "..."
}

Explicit vs. implicit

Bottom line – your upload has state information. You are probably capturing it anyway in logs or other resources. If you have some subscribers as above, you want to make that information explicit, and in many cases, client controlled.

Testing Search (Haystack) in Django

Django’s build-in testing framework is extremely handy. As long as you use the ORM with a supported data store, a test database is used for the duration of the tests and is cleaned up in between unit tests. There is no need for elaborate mocking – something I had grown accustom to in .NET.

Here is a quick sample, edited for brevity:

$ ./manage.py test appname -v 2

Creating test database for alias 'default' ('test_projectname')…
Syncing...
Creating tables ...
test_first (projectname.test.SampleTestClass) ... ok
test_second (projectname.test.SampleTestClass) ... ok
test_third (projectname.test.SampleTestClass) ... ok
Ran 3 tests in 1.260s
OK
Destroying test database for alias 'default' ('test_projectname')

But if you are using some external source of data, it is necessary to create a mock or some fake environment (as Django does).

Haystack is a handy library that abstracts out the details of various search engines. You get some powerful features build into something like Elasticsearch – high availability, full text search, spelling correct, more like this, etc – in some functions and data structures familiar to Django using developers.

But if you are integration testing, the tests are calling your views directly and your views are updating or retrieving data from an external search engine, you are going to potentially have a bad time. Stuff will be persisted between unit tests and your results will be likely be inconsistent.

The solution is pretty simple actually. Fire up a new index, override the settings such that the new index is the target for the Haystack calls for the duration of tests, and clear the index between tests.

TEST_INDEX = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'TIMEOUT': 60 * 10,
        'INDEX_NAME': 'test_index',
    },
}


@override_settings(HAYSTACK_CONNECTIONS=TEST_INDEX)
class BaseTestCase(TestCase):

    def setUp(self):
        haystack.connections.reload('default')
        super(BaseTestCase, self).setUp()

    def tearDown(self):
        call_command('clear_index', interactive=False, verbosity=0)

GIST here