A pattern for expressive security in a generic CMS

Hello everyone.  I’d like to spend some time discussing a fairly simple but (from what I can tell) unique security feature in Cyanic Business Automation Studio (the platform that powers Cyanic HSE) in the hope that it may help or inspire other developers working with or building solutions with generic CMS systems.

Imagine the following scenario: You are building a work-order management system for a company where office staff have administrative access to all work orders in the system, and they assign jobs to external contractors when required.  Contractors should only be able to see and update jobs that are assigned to them, and it’s not good enough to simply filter results on the page – we need real security that’s enforced down at the data layer.  If you’re using SharePoint, you could easily set up your Content Types and lists to model the data, but you’ll run into a problem with the security model.  SharePoint allows ACLs controlling blanket access to the list, and granular item-level ACLs, but nothing based on expressive rules.  In our example, you’d need some external process, driven by Event Receivers or SharePoint Workflow to continually update item-level access whenever the data changes.  If you’ve ever heard my thoughts on Event Receivers or SharePoint Workflow, you can imagine my opinion on the long-term reliability prospects of such a solution.  From what I can tell, other CMS’s are very much the same.

Of course, you would have none of these problems in a purely custom solution, as your data layer could enforce whatever business rules you need, but if you’re anything like us and trying to make good solutions at a price point that’s affordable to small and medium-sized customers, you really need to be using a generic framework and customizing to fit.  Anyway, we can solve this, and since we’re not using SharePoint, we still have enough will-to-live and intestinal fortitude that we can implement an elegant solution.

 

Some background on Cyanic Business Automation Studio’s data architecture:

All user data in CBAS is stored as JSON records in PostgreSQL (which offers excellent support for working with this kind of data and querying documents in SQL).  All logically related records are grouped into what we call collections which enforce schema constraints (specifically, JSON Schema) on its contents.  Collections are then grouped into realms, which are isolated security environments, each with their own sets of users and roles, typically wrapping up all of the functions and data belonging to a particular organization.  At a high level, the data model looks like this:

CBAS-data-model

A record is represented by a row in the database and along with holding metadata such as when the record was created and by whom, it contains a data column of the JSON type.  As such, we can hold complicated and nested structures with a very simple representation, and we can query those structures with familiar SQL and a minimum of joins.  On a side note: our opinion is that PostgreSQL is a better document store than MongoDB, since we’re still able to benefit from referential integrity, transactions, a powerful query language, and all the other niceties you’ve come to expect from a modern RDBMS.  Here’s what a data record looks like, coming out of our web api:

 

{
  "id": "8bf7c9f3-40bd-45e6-afc6-91b875112c21",
  "createdTime": "2014-04-08T21:08:58.685Z",
  "createdBy": "cf6600f4-57a4-4493-893f-658fb42e9c67",
  "createdByName": "System",
  "updatedTime": "2014-05-12T21:21:05.737Z",
  "updatedBy": "cf6600f4-57a4-4493-893f-658fb42e9c67",
  "updatedByName": "System",
  "data": {
    "WorkToBeDone": "Sprinkler system broken. Does not turn off",
    "TaskLocation": "11639 76 Ave, Edmonton",
    "PhoneNumber": "780-328-2400",
    "EstimatedArrival": "2014-04-09T19:00:00.000Z",
    "EmergencyMeetingLocation": "work van",
    "AssignedTo": {
      "id": "1aead7ed-9661-43e7-b01c-04afd5b8e87b"
    }
    "Start": "2014-04-09T19:14:00.000Z",
    "Tasks": [
      {
        "Task": "turn off valve",
        "Hazards": [
          {
            "Hazard": "trip and fall",
            "Controls": "watch step"
          }
        ]
      }
    ],
    "UseOfGlovesRequired": false,
    "WarningRibbonRequired": false,
    "WorkingAlone": false,
    "WorkingAloneExplanation": "",
    "AllDoorsLocked": true,
    "AreaCleanedUp": true,
    "Incident": false,
    "IncidentExplanation": "",
    "HazardsRemaining": true,
    "HazardsRemainingExplanation": "uncovered hole",
    "HazardsRemainingPhoto": {
      "id": "77084320-4ff2-4c24-8987-477575bdd988",
      "contentType": "image/jpeg"
    },
    "CustomerSignature": {
      "id": "1524b4d9-c0b5-4a8d-b134-4326cf3c51ed",
      "contentType": "image/png"
    },
    "End": "2014-04-09T19:33:00.000Z",
    "TaskLocationCoords": {
      "latitude": 53.514153,
      "longitude": -113.5285366
    }
  }
}

To query user data, Javascript expressions are passed to our data provider (internally, or through our REST web api).  The provider parses these expressions into expression trees (using the excellent Acorn module) and recursively generates parameterized SQL that can be run against the database.  Of course, only certain types of query semantics are supported, and any reference to a property not belonging to the schema, or unsupported expression or parameter type will generate a security fault.

Example: query all in-progress records (those with a start time set, but no end time) that belong to a specific user (in this case, the user’s id is ‘1aead7ed-9661-43e7-b01c-04afd5b8e87b’):

Query: $filter=data.AssignedTo.id == '1aead7ed-9661-43e7-b01c-04afd5b8e87b' && data.Start != null && data.End == null

Generated Expression tree:

{
  "type": "ExpressionStatement",
  "expression": {
    "type": "LogicalExpression",
    "left": {
      "type": "LogicalExpression",
      "left": {
        "type": "BinaryExpression",
        "left": {
          "type": "MemberExpression",
          "object": {
            "type": "MemberExpression",
            "object": {
              "type": "Identifier",
              "name": "data"
            },
            "property": {
              "type": "Identifier",
              "name": "AssignedTo"
            },
            "computed": false
          },
          "property": {
            "type": "Identifier",
            "name": "id"
          },
          "computed": false
        },
        "operator": "==",
        "right": {
          "type": "Literal",
          "value": "1aead7ed-9661-43e7-b01c-04afd5b8e87b",
          "raw": "'1aead7ed-9661-43e7-b01c-04afd5b8e87b'"
        }
      },
      "operator": "&&",
      "right": {
        "type": "BinaryExpression",
        "left": {
          "type": "MemberExpression",
          "object": {
            "type": "Identifier",
            "name": "data"
          },
          "property": {
            "type": "Identifier",
            "name": "Start"
          },
          "computed": false
        },
        "operator": "!=",
        "right": {
          "type": "Literal",
          "value": null,
          "raw": "null"
        }
      }
    },
    "operator": "&&",
    "right": {
      "type": "BinaryExpression",
      "left": {
        "type": "MemberExpression",
        "object": {
          "type": "Identifier",
          "name": "data"
        },
        "property": {
          "type": "Identifier",
          "name": "End"
        },
        "computed": false
      },
      "operator": "==",
      "right": {
        "type": "Literal",
        "value": null,
        "raw": "null"
      }
    }
  }
}

SQL:
SELECT ... WHERE r.id=$1 AND c.id=$2 AND (((d.data->$3->>$4) = $5 AND (d.data->>$6) IS NOT NULL) AND (d.data->>$7) IS NULL)

Where the query params are:

  • $1: <realmId>
  • $2: <collectionId>
  • $3: “AssignedTo”
  • $4: “id”
  • $5: “1aead7ed-9661-43e7-b01c-04afd5b8e87b”
  • $6: “Start”
  • $7: “End”

If you’ve heard me railing against ORMs due to their complexity and loss of control, I offer this pattern as a reasonable alternative. The code required to make this work isn’t large (looking at our query generator, it’s currently about 200 lines of code), is totally dynamic and gives you total control over what semantics are supported and how queries are generated.

 

Security enforcement

Now, how is security enforced on user queries?  The user’s security context is established and is available to every provider in the data layer.  Further, every type of entity in the system has ACLs mapping functional access to principals (either users or groups).  In the case of a collection, an ACL contains the following fields:

  • read: whether or not the principal can read the collection definition.
  • write: whether or not the principal can modify the collection definition.
  • item_read: whether or not the principal can read the records in the collection.
  • item_create: whether or not the principal can create new records in the collection.
  • item_update: whether or not the principal can update existing records in the collection.
  • item_delete: whether or not the principal can delete existing records in the collection.
  • item_read_expr: if item_read is true, an expression to mask the results of a record query.
  • item_update_expr: if item_update is true, an expression which must evaluate true on a record to allow updating of that record.
  • item_delete_expr: if item_delete is true, an expression which must evaluate true on a record to allow deletion of that record.

The ACL expressions are Javascript and like all other expresssions, are schema and form-validated.  These expressions, if defined, are spliced into the main query tree and act as a mask over what would normally be returned by the query.

The overall logic flow then looks like this:

  • Client submits query request to data provider.
  • Provider first checks if the user has access to any ACLs on the collection granting basic item_read access (required at a minimum).
  • Generate a query tree from the main query expression.
  • If there is an accessible ACL expression, parse it into another expression tree and combine it with the main query tree, each subtree connected with an ‘AND’ operator.  If there are multiple ACL expressions, the expressions in the ACL subtree are connected with an ‘OR’ operator.
  • Generate the parameterized SQL for the entire tree, and execute the query against the database.

 

Solution

Getting back to the original problem where contractors can only modify work orders that they have been assigned, we can see that the solution is a single ACL for the ‘work orders’ collection assigned to the ‘Contractors’ role, with the following:

  • read: true
  • write: false
  • item_read: true
  • item_create: false
  • item_update: true
  • item_delete: false
  • item_read_expr: “data.AssignedTo.id == context.userId”
  • item_update_expr: “data.AssignedTo.id == context.userId”
  • item_delete_expr: null

So, regardless of any queries submitted to the data layer by a Contractor, the “data.AssignedTo.id == context.userId” expression must always be satisfied before the record is returned.  Here, the contextual userId (‘1aead7ed-9661-43e7-b01c-04afd5b8e87b’) is substituted into the query at time of generation, but you can implement any kind of contextual support functions.  The limit is solely your imagination, and how big you want your query generators to get.

 

So, that’s basically it.  I hope this post gives someone out there some ideas, and if you have any questions, please post them below.

 

Happy hacking!