The beautiful simplicity of Apache Ranger plugin
If you are here, you already know what Apache Ranger is. It is the most popular, if not the only, way to manage security in the Hadoop framework. It has integrations with Active Directory, Kerberos and various others for authentication but I believe the most interesting feature is its authorization support. Being part of the Hadoop ecosystem, one would not be surprised that it has inbuilt support (via plugins) for most frameworks in the Hadoop ecosystem - Hive, HBase, HDFS etc. However, I've found that it's actually very easy to spin your own custom plugin for Ranger.
This post would focus on the simplicity of design in Ranger plugins and showcase how easy it is to build one for ourselves. As an example, we'll build a Ranger plugin for managing access to a simple HTTP service written using Akka HTTP.
Note : You are not required to know about Akka HTTP to follow this post. All you needed to know is that Akka HTTP is just a way (albeit, a great way) to build HTTP services
The code behind this post is split into two repositories:
Writing a plugin
To reiterate what we are attempting to do here, we are going to write a REST service and let Ranger manage the authorization for it.
Writing a Ranger plugin is actually a two part problem - writing the server-side component and the application-side component.
-
Server-side component is the code/configuration that resides on the Ranger side.
-
Application-side component is the code that resides in our REST service that invokes the Ranger service and checks if the application's end user has access to the resource that he is requesting.
We'll look into these two things in detail. Let's attempt to write the server-side components first.
1. Server-side components :
As an inspiration, if we open up the ranger code base, we can see some of the in-built plugins.
Pictorially, within the Ranger code base, we have bunch of plugins and we would like to add our own plugin.
Zooming in the previous picture, the server-side component on the plugin would mean writing a
- servicedef configuration
- A class that inherits
RangerBaseService
So, there's literally "one" configuration and "one" class that you need to implement for the server-side.
1. servicedef configuration
Let's look at Hive's servicedef configuration :
In my opinion, there are three important things that we are talking about here :
a. Resource:
In the Hive example, the "resource" that we are trying to protect are databases, tables and columns, for Kafka, the "resource" that we are trying to protect is the Kafka topic, for HDFS, it would be a file path. For our HTTP service, the resource that we are trying to protect is the REST slug. Let's call it a "path".
"resources": [
{
"itemId": 1,
"name": "path",
"type": "path",
"level": 10,
"parent": "",
"mandatory": true,
"lookupSupported": true,
"recursiveSupported": true,
"excludesSupported": true,
"matcher": "org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher",
"matcherOptions": {
"wildCard": true,
"ignoreCase": true
},
"validationRegEx": "",
"validationMessage": "",
"uiHint": "",
"label": "HTTP Path",
"description": "HTTP Path"
}
b. Access Type:
Access types simply means the kind of access that the user would require - say, for Hive, select, create, delete would be examples. For HDFS, read, write, execute would be examples. For Kafka, publish and consume. For our HTTP service, the access type would be the HTTP methods - GET, POST, DELETE.
"accessTypes": [
{
"itemId": 1,
"name": "get",
"label": "get"
},
{
"itemId": 2,
"name": "post",
"label": "post"
},
{
"itemId": 3,
"name": "delete",
"label": "delete"
}
]
c. Configs:
We know that Ranger can manage security for several Kakfa topics, HDFS and HBase clusters. Each of these services would be running in a different host and the way to authenticate into each of them would be different. The place to capture this information would be this configs
part. For the sake of simplicity of this example, we don't care about authentication for our HTTP service. So, we are just capturing a URL that we could ping to, to ensure that our service is up and running.
"configs": [
{
"itemId": 1,
"name": "services_list_url",
"type": "string",
"subType": "",
"mandatory": true,
"validationRegEx": "",
"validationMessage": "",
"uiHint": "",
"label": "HTTP URL for the services list eg. http://localhost:8080/services"
}
]
2. A class that inherits RangerBaseService
The second and the last part of implementing our server-side component for the ranger plugin is to write a class that inherits the RangerBaseService
.
The class expects two functions to be overridden:
validateConfig
: Remember theconfigs
section of the servicedef. Obviously, we would be accepting values for those parameters right? Now, thisvalidateConfig
is the place where we validate the values that are passed. For our HTTP service, all that we are accepting in the config is the services_list_url. Now, the implementation of this function would be to use a simple HTTP client to ping and check whether the service is up and running.
class RangerServiceHTTP extends RangerBaseService {
override def validateConfig(): util.Map[String, AnyRef] = {
if (configs.containsKey("services_list_url")) {
val serviceUp = HttpServiceClient.isServiceUp(configs.get("services_list_url"))
if (serviceUp) retSuccessMap() else returnFailureMap()
}
else {
returnFailureMap()
}
}
lookupResource
: This is an interesting function. Consider the following screenshot.
Later, when we configure an access policy, we would be configuring the resources in it. Now, this function is used to lookup and autofill those resources. Say, if we are entering a HDFS resource or Hive table, the number of options are quite a lot and it's easy to do a typo. In case of Hive, this function would connect to the metastore and populate the tables and databases for us.
In the case of HTTP service, remember the service_list_url
? That URL would just return a comma separated list of REST resources. For implementing this function, I am just calling the service again and tokenizing the response.
override def lookupResource(resourceLookupContext: ResourceLookupContext): util.List[String] = {
val serviceUrl = configs.get("services_list_url")
HttpServiceClient.getServicePaths(serviceUrl).asJava
}
Now, as a final step to the code, we'll need to tie this class RangerServiceHTTP
and the servicedef configuration together. The way we do this is by configuring the class in the implClass
property. Also notice that we are configuring the name of this ranger plugin as httpservice
:
{
"name": "httpservice",
"label": "HTTP Service",
"description": "Rudimentary Ranger plugin to enforce security on top of a HTTP Service",
"guid": "b8290b7f-6f69-44a9-89cc-06b6975ea676",
"implClass": "com.arunma.ranger.http.RangerServiceHTTP",
* * "version": 1,
"isEnabled": 1,
"resources": [
{
"itemId": 1,
"name": "path",
...
...
The full configuration looks like this.
There are two more minor administrative steps:
- In order to ensure that our class is made available on the Ranger classpath, we'll bundle it into a jar and drop it at
<RANGER_HOME>/ews/webapp/WEB-INF/classes/ranger-plugins/httpservice
. The name of the folderhttpservice
corresponds to the name that is declared in theservicedef
configuration.
- Upload our configuration into Ranger so that our service becomes visible in the Ranger UI.
curl -u admin:admin -X POST -H "Accept: application/json" -H "Content-Type: application/json" --data @http-ranger.json http://localhost:6080/service/plugins/definitions
Bounce the Ranger server.
Yaay! We now see HTTPSERVICE on our Ranger UI
2. Application-side components :
On the application side, things couldn't get any simpler. In order to use the policies used in Ranger, all that an application would need to do is to call Ranger and check if the user has access to a resource. The function is literally called isAccessAllowed
.
The following code is pretty much all that needs to be written on the application side:
package com.arunma.ranger
import org.apache.ranger.plugin.audit.RangerDefaultAuditHandler
import org.apache.ranger.plugin.policyengine.{RangerAccessRequestImpl, RangerAccessResourceImpl}
import org.apache.ranger.plugin.service.RangerBasePlugin
import scala.collection.JavaConverters._
object RangerAuthorizer {
lazy val plugin = {
val plg = new RangerBasePlugin("httpservice", "httpservice")
plg.setResultProcessor(new RangerDefaultAuditHandler)
plg.init()
plg
}
def authorize(path: String, accessType: String, userName: String, userGroups: Set[String] = Set("public")): Boolean = {
val resource = new RangerAccessResourceImpl()
resource.setValue("path", path)
val request = new RangerAccessRequestImpl(resource, accessType, userName, userGroups.asJava)
val result = plugin.isAccessAllowed(request)
result != null && result.getIsAllowed
}
}
The RangerBasePlugin("httpservice", "httpservice")
and the init()
function serves as our entry point into the Ranger service. Note the httpservice
parameter inside the RangerBasePlugin
. This must match the name that was given in the servicedef configuration.
The authorize
function is the one that gets called by the interceptor just before the client is given access to a REST resource. The function simply constructs a AccessRequest - the RangerAccessRequestImpl
and calls the plugin's isAccessAllowed
function, which returns a Boolean
.
The interceptor directive authorize
invokes the function isRangerAuthorized
which then calls the authorize
function in RangerAuthorizer.
def isRangerAuthorized(path: String, httpMethod: String, userName: String): Boolean = RangerAuthorizer.authorize(path, httpMethod.toLowerCase, userName)
lazy val userRoutes: Route =
headerValueByName("username") { userName =>
extractMethod { method =>
pathPrefix("users") {
extractMatchedPath { matchedPath =>
authorize(isRangerAuthorized(matchedPath.toString(), method.name(), userName)) {
concat(
pathEnd {
concat(
get {
val users: Future[Users] =
(userRegistryActor ? GetUsers).mapTo[Users]
complete(users)
One last thing that we are required to do is to copy an audit
and security
xml into our classpath. These are like the site xmls for Ranger. For this exercise, we'll just place the xmls in our resources
directory.
The audit
xml and the security
xml could be copied from the ranger codebase. If you are running a local ranger, the audit XML can remain as-is but security
xml needs to be changed for our service. The easiest way to achieve this is to copy a sample xml from the ranger code base and start replacing the service as httpservice
like so:
There's also one property that needs special attention. That's the property called ranger.plugin.httpservice.service.name
. This property's value must be the same as the Service Name that you use in your Ranger UI.
<property>
<name>ranger.plugin.httpservice.service.name</name>
<value>MyService</value>
<description>
Name of the Ranger service containing policies for this httpservice instance
</description>
</property>
Test Ride
This would involve two steps
- Configure a Ranger Policy
- Verifying your HTTP Service
1. Configure a Ranger Policy
2. Verifying your HTTP Service
Let's verify the policy by bringing up our HTTP Service - start the com.arunma.RangerManagedHttpServer
Policy-configured user
curl -X GET -H 'username:arunma' http://localhost:8080/users
Invalid user
curl -X GET -H 'username:nobody' http://localhost:8080/users
Summary
The Ranger plugin has two parts to it - a server-side component and a client-side component. For the server-side component, we created a servicedeef
json and a class that inherited the RangerBaseService
. For the client side-component, we just called an isAccessAllowed
function of the plugin
.
You now have a working Ranger authorized HTTP Service.
Thanks for reading. Happy Hacking !