May 14, 2019 · scala hadoop ranger akka

The beautiful simplicity of Apache Ranger plugin

If you are here, you already know what Apache Ranger is. It is the most popular, if not the only, way to manage security in the Hadoop framework. It has integrations with Active Directory, Kerberos and various others for authentication but I believe the most interesting feature is its authorization support. Being part of the Hadoop ecosystem, one would not be surprised that it has inbuilt support (via plugins) for most frameworks in the Hadoop ecosystem - Hive, HBase, HDFS etc. However, I've found that it's actually very easy to spin your own custom plugin for Ranger.

This post would focus on the simplicity of design in Ranger plugins and showcase how easy it is to build one for ourselves. As an example, we'll build a Ranger plugin for managing access to a simple HTTP service written using Akka HTTP.

Note : You are not required to know about Akka HTTP to follow this post. All you needed to know is that Akka HTTP is just a way (albeit, a great way) to build HTTP services

The code behind this post is split into two repositories:

  1. Ranger HTTP plugin
  2. Ranger Managed Akka HTTP Service

Writing a plugin

To reiterate what we are attempting to do here, we are going to write a REST service and let Ranger manage the authorization for it.

Writing a Ranger plugin is actually a two part problem - writing the server-side component and the application-side component.

  1. Server-side component is the code/configuration that resides on the Ranger side.

  2. Application-side component is the code that resides in our REST service that invokes the Ranger service and checks if the application's end user has access to the resource that he is requesting.

We'll look into these two things in detail. Let's attempt to write the server-side components first.

1. Server-side components :

As an inspiration, if we open up the ranger code base, we can see some of the in-built plugins.

List of plugins

Pictorially, within the Ranger code base, we have bunch of plugins and we would like to add our own plugin.

Coarse plugins

Zooming in the previous picture, the server-side component on the plugin would mean writing a

  1. servicedef configuration
  2. A class that inherits RangerBaseService

So, there's literally "one" configuration and "one" class that you need to implement for the server-side.

Fine plugins

1. servicedef configuration

Let's look at Hive's servicedef configuration :

Hive servicedef

In my opinion, there are three important things that we are talking about here :

a. Resource:

In the Hive example, the "resource" that we are trying to protect are databases, tables and columns, for Kafka, the "resource" that we are trying to protect is the Kafka topic, for HDFS, it would be a file path. For our HTTP service, the resource that we are trying to protect is the REST slug. Let's call it a "path".

"resources": [
    {
      "itemId": 1,
      "name": "path",
      "type": "path",
      "level": 10,
      "parent": "",
      "mandatory": true,
      "lookupSupported": true,
      "recursiveSupported": true,
      "excludesSupported": true,
      "matcher": "org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher",
      "matcherOptions": {
        "wildCard": true,
        "ignoreCase": true
      },
      "validationRegEx": "",
      "validationMessage": "",
      "uiHint": "",
      "label": "HTTP Path",
      "description": "HTTP Path"
    }
b. Access Type:

Access types simply means the kind of access that the user would require - say, for Hive, select, create, delete would be examples. For HDFS, read, write, execute would be examples. For Kafka, publish and consume. For our HTTP service, the access type would be the HTTP methods - GET, POST, DELETE.

"accessTypes": [
    {
      "itemId": 1,
      "name": "get",
      "label": "get"
    },
    {
      "itemId": 2,
      "name": "post",
      "label": "post"
    },
    {
      "itemId": 3,
      "name": "delete",
      "label": "delete"
    }
  ]
c. Configs:

We know that Ranger can manage security for several Kakfa topics, HDFS and HBase clusters. Each of these services would be running in a different host and the way to authenticate into each of them would be different. The place to capture this information would be this configs part. For the sake of simplicity of this example, we don't care about authentication for our HTTP service. So, we are just capturing a URL that we could ping to, to ensure that our service is up and running.


"configs": [
    {
      "itemId": 1,
      "name": "services_list_url",
      "type": "string",
      "subType": "",
      "mandatory": true,
      "validationRegEx": "",
      "validationMessage": "",
      "uiHint": "",
      "label": "HTTP URL for the services list eg. http://localhost:8080/services"
    }
  ]

2. A class that inherits RangerBaseService

The second and the last part of implementing our server-side component for the ranger plugin is to write a class that inherits the RangerBaseService.

Subclasses of RangerBaseService

The class expects two functions to be overridden:

  1. validateConfig: Remember the configs section of the servicedef. Obviously, we would be accepting values for those parameters right? Now, this validateConfig is the place where we validate the values that are passed. For our HTTP service, all that we are accepting in the config is the services_list_url. Now, the implementation of this function would be to use a simple HTTP client to ping and check whether the service is up and running.
class RangerServiceHTTP extends RangerBaseService {

  override def validateConfig(): util.Map[String, AnyRef] = {
    if (configs.containsKey("services_list_url")) {
      val serviceUp = HttpServiceClient.isServiceUp(configs.get("services_list_url"))
      if (serviceUp) retSuccessMap() else returnFailureMap()
    }
    else {
      returnFailureMap()
    }
  }
  1. lookupResource: This is an interesting function. Consider the following screenshot.

Policy lookup

Later, when we configure an access policy, we would be configuring the resources in it. Now, this function is used to lookup and autofill those resources. Say, if we are entering a HDFS resource or Hive table, the number of options are quite a lot and it's easy to do a typo. In case of Hive, this function would connect to the metastore and populate the tables and databases for us.

In the case of HTTP service, remember the service_list_url? That URL would just return a comma separated list of REST resources. For implementing this function, I am just calling the service again and tokenizing the response.

 override def lookupResource(resourceLookupContext: ResourceLookupContext): util.List[String] = {
    val serviceUrl = configs.get("services_list_url")
    HttpServiceClient.getServicePaths(serviceUrl).asJava
  }

Now, as a final step to the code, we'll need to tie this class RangerServiceHTTP and the servicedef configuration together. The way we do this is by configuring the class in the implClass property. Also notice that we are configuring the name of this ranger plugin as httpservice:

{
  "name": "httpservice",
  "label": "HTTP Service",
  "description": "Rudimentary Ranger plugin to enforce security on top of a HTTP Service",
  "guid": "b8290b7f-6f69-44a9-89cc-06b6975ea676",
  "implClass": "com.arunma.ranger.http.RangerServiceHTTP",
* *   "version": 1,
  "isEnabled": 1,
  "resources": [
    {
      "itemId": 1,
      "name": "path",
      ...
      ...

The full configuration looks like this.

There are two more minor administrative steps:

  1. In order to ensure that our class is made available on the Ranger classpath, we'll bundle it into a jar and drop it at <RANGER_HOME>/ews/webapp/WEB-INF/classes/ranger-plugins/httpservice. The name of the folder httpservice corresponds to the name that is declared in the servicedef configuration.

TODO - HTTP service

  1. Upload our configuration into Ranger so that our service becomes visible in the Ranger UI.
curl -u admin:admin -X POST -H "Accept: application/json" -H "Content-Type: application/json" --data @http-ranger.json http://localhost:6080/service/plugins/definitions

Bounce the Ranger server.

Yaay! We now see HTTPSERVICE on our Ranger UI

TODO - Ranger UI HTTP Service

2. Application-side components :

On the application side, things couldn't get any simpler. In order to use the policies used in Ranger, all that an application would need to do is to call Ranger and check if the user has access to a resource. The function is literally called isAccessAllowed.

TODO - Server and Client side components

The following code is pretty much all that needs to be written on the application side:

package com.arunma.ranger

import org.apache.ranger.plugin.audit.RangerDefaultAuditHandler
import org.apache.ranger.plugin.policyengine.{RangerAccessRequestImpl, RangerAccessResourceImpl}
import org.apache.ranger.plugin.service.RangerBasePlugin

import scala.collection.JavaConverters._

object RangerAuthorizer {
  lazy val plugin = {
    val plg = new RangerBasePlugin("httpservice", "httpservice")
    plg.setResultProcessor(new RangerDefaultAuditHandler)
    plg.init()
    plg
  }

  def authorize(path: String, accessType: String, userName: String, userGroups: Set[String] = Set("public")): Boolean = {
    val resource = new RangerAccessResourceImpl()
    resource.setValue("path", path)
    val request = new RangerAccessRequestImpl(resource, accessType, userName, userGroups.asJava)
    val result = plugin.isAccessAllowed(request)
    result != null && result.getIsAllowed
  }
}

The RangerBasePlugin("httpservice", "httpservice") and the init() function serves as our entry point into the Ranger service. Note the httpservice parameter inside the RangerBasePlugin. This must match the name that was given in the servicedef configuration.

The authorize function is the one that gets called by the interceptor just before the client is given access to a REST resource. The function simply constructs a AccessRequest - the RangerAccessRequestImpl and calls the plugin's isAccessAllowed function, which returns a Boolean.

The interceptor directive authorize invokes the function isRangerAuthorized which then calls the authorize function in RangerAuthorizer.


def isRangerAuthorized(path: String, httpMethod: String, userName: String): Boolean = RangerAuthorizer.authorize(path, httpMethod.toLowerCase, userName)  

lazy val userRoutes: Route =
    headerValueByName("username") { userName =>
      extractMethod { method =>
        pathPrefix("users") {
          extractMatchedPath { matchedPath =>
            authorize(isRangerAuthorized(matchedPath.toString(), method.name(), userName)) {
              concat(
                pathEnd {
                  concat(
                    get {
                      val users: Future[Users] =
                        (userRegistryActor ? GetUsers).mapTo[Users]
                      complete(users)

One last thing that we are required to do is to copy an audit and security xml into our classpath. These are like the site xmls for Ranger. For this exercise, we'll just place the xmls in our resources directory.

The audit xml and the security xml could be copied from the ranger codebase. If you are running a local ranger, the audit XML can remain as-is but security xml needs to be changed for our service. The easiest way to achieve this is to copy a sample xml from the ranger code base and start replacing the service as httpservice like so:

Security XML

There's also one property that needs special attention. That's the property called ranger.plugin.httpservice.service.name. This property's value must be the same as the Service Name that you use in your Ranger UI.

<property>
	<name>ranger.plugin.httpservice.service.name</name>
	<value>MyService</value>
	<description>
		Name of the Ranger service containing policies for this httpservice instance
	</description>
</property>

Service Name

Narrow Service Name

Test Ride

This would involve two steps

  1. Configure a Ranger Policy
  2. Verifying your HTTP Service

1. Configure a Ranger Policy

Policy screenshot

2. Verifying your HTTP Service

Let's verify the policy by bringing up our HTTP Service - start the com.arunma.RangerManagedHttpServer

Policy-configured user

curl -X GET -H 'username:arunma' http://localhost:8080/users

Curl True

Invalid user

curl -X GET -H 'username:nobody' http://localhost:8080/users

Curl Bad user

Summary

The Ranger plugin has two parts to it - a server-side component and a client-side component. For the server-side component, we created a servicedeef json and a class that inherited the RangerBaseService. For the client side-component, we just called an isAccessAllowed function of the plugin.

You now have a working Ranger authorized HTTP Service.

Thanks for reading. Happy Hacking !