Geospatial Apache Solr searching in Drupal 7 using the Search API module (Ubuntu version)

In this tutorial, I'll share my notes and code I've used to setup geospatial Apache Solr searching in Drupal 7 using the Search API module. For this tutorial I created a minimal Ubuntu server virtual machine. All the commands should be executed as a user with permission to modify files, or prefixed with "sudo".

The first thing I do with a fresh virtual machine is check for package upgrades.

$ apt-get update
$ apt-get upgrade

I find it cumbersome to type in a virtual machine window, so I'll install open-ssh and ssh from my Mac. If you plan to do so, you'll need to find your virtual machine's IP address using ifconfig. For this tutorial I added local DNS (/etc/hosts) to point "drupal7.vm" to my VM's IP.

$ apt-get install openssh-server

Install the LAMP stack. The following packages will install Apache httpd as a dependency.

$ apt-get install php5 php5-cli php5-common php5-curl php5-gd php5-mysql php-pear mysql-server

At this point, browsing to your VM/server's IP address will give you the standard Apache welcome message:
It works!
This is the default web page for this server.
The web server software is running but no content has been added, yet.

Install version control.

$ apt-get install git-core

Create a mysql database for Drupal 7.

$ mysql -u youruser -p
mysql> create database drupal7;
mysql> grant all privileges on drupal7.* to 'drupal7'@'localhost' identified by 'somepassword';
mysql> exit

Install drush via Pear.

$ pear upgrade-all
$ pear channel-discover pear.drush.org
$ pear install drush/drush

Verifying drush is installed.

$ which drush
/usr/bin/drush
$ drush --version
drush version 4.5

Create an Apache vhost directory

$ mkdir -p /var/www/vhosts

Download drupal via drush

$ cd /var/www/vhosts
$ drush dl drupal
# rename folder (as necessary)
$ mv drupal-7.10 drupal7

Integrate drupal file system with git

$ cd drupal7
$ git init
$ git add .
$ git commit -am "initial commit of drupal7"

Install drupal via drush

$ drush site-install standard --db-url=mysql://dbuser:pass@localhost/dbname

Add Apache2 vhost

$ cd /etc/apache2/sites-available
# create new file, called "drupal7" with contents:

  ServerName drupal7.vm
  DocumentRoot /var/www/vhosts/drupal7
  ErrorLog /var/log/apache2/drupal7-error_log
  CustomLog /var/log/apache2/drupal7-access_log combined
 
    AllowOverride All
 

# create symlink
$ cd ../sites-enabled
$ ln -s ../sites-available/drupal7 001-drupal7.conf

# enable apache2 mod_rewrite module
$ a2enmod rewrite

# restart apache2
$ /etc/init.d/apache2 restart

At this point, browsing to your VM/server's hostname should show a Drupal installation.

Part 2, Tomcat/Solr

Installing java jdk and tomcat6

$ apt-get install openjdk-6-jdk tomcat6 tomcat6-admin tomcat6-common tomcat6-user

Browsing to your VM/server's hostname on port 8080 (ex: http://drupal7.vm:8080) will show the generic Tomcat welcome message:
It works !
If you're seeing this page via a web browser, it means you've setup Tomcat successfully. Congratulations!

Installing Solr in Tomcat

$ mkdir ~/downloads
$ cd ~/downloads
# Download the latest stable version of Apache Solr from:
url: http://www.apache.org/dyn/closer.cgi/lucene/solr/
# example:
$ wget http://www.motorlogy.com/apache//lucene/solr/3.5.0/apache-solr-3.5.0.tgz
$ tar -xzf apache-solr-3.5.0.tgz

Copy/rename java war file into Tomcat webapps directory

$ cp ~/downloads/apache-solr-3.5.0/dist/apache-solr-3.5.0.war /var/lib/tomcat6/webapps/solr.war

Note: copying the java war file into the Tomcat webapps folder will create this directory automatically:

/var/lib/tomcat6/webapps/solr

Copy solr files

$ cp -r ~/downloads/apache-solr-3.5.0/example/solr/ /var/lib/tomcat6/solr/

Create Catalina config file to link war file to solr directory

$ cd /etc/tomcat6/Catalina/localhost
# create new file: "solr.xml", with the contents:
<?xml version="1.0" encoding="UTF-8"?>


Setup Tomcat admin user(s)

# edit file: /etc/tomcat6/tomcat-users.xml, ensure similar contents exist:
<?xml version='1.0' encoding='utf-8'?>




Update webapps WEB-INF/web.xml file

# edit file: /var/lib/tomcat6/webapps/solr/WEB-INF/web.xml, update "solr/home" section to reflect solr path:

  solr/home
  /var/lib/tomcat6/solr
  java.lang.String

Download search api drupal modules that contain solr xml configuration files, and copy into solr conf directory

$ mkdir -p /var/www/vhosts/drupal7/sites/all/modules/contrib
$ cd /var/www/vhosts/drupal7/sites/all/modules/contrib
$ drush dl search_api search_api_solr
$ cp /var/www/vhosts/drupal7/sites/all/modules/contrib/search_api_solr/solrconfig.xml /var/lib/tomcat6/solr/conf/
$ cp /var/www/vhosts/drupal7/sites/all/modules/contrib/search_api_solr/schema.xml /var/lib/tomcat6/solr/conf/

Reset tomcat permissions, and restart tomcat

$ cd /var/lib
$ chown -R tomcat6.tomcat6 tomcat6
$ /etc/init.d/tomcat6 restart

You should now be able to browse to the solr admin java page.
Example: http://drupal7.vm:8080/solr/admin/
Solr Admin Page

If things aren't working well at this point, check the Tomcat logs and look for SEVERE log entries

/var/log/tomcat6/catalina.out

In addition, the solr java module should be listed in the Tomcat Web Application Manager
Ex URL: http://drupal7.vm:8080/manager/html

Part 3, Drupal code

Getting the solr-php-client library from code.google.com

$ mkdir -p /var/www/vhosts/drupal7/sites/all/libraries
$ cd /var/www/vhosts/drupal7/sites/all/libraries

# URL: http://code.google.com/p/solr-php-client/downloads/list
# File: SolrPhpClient.r60.2011-05-04.tgz
$ wget http://solr-php-client.googlecode.com/files/SolrPhpClient.r60.2011-05-04...
$ tar -xzf SolrPhpClient.r60.2011-05-04.tgz

Downloading and installing contrib drupal modules

$ cd /var/www/vhosts/drupal7
$ drush dl entity views ctools facetapi
$ drush en search_api search_api_views search_api_solr search_api_facetapi entity views views_ui ctools facetapi

(Optionally) I install devel, admin_menu, and disable overlay/toolbar

$ drush dl devel admin_menu
$ drush en devel admin_menu
$ drush dis overlay toolbar

Add the tomcat/solr server to Search API configuration.
- URL: /admin/config/search/search_api
- click on "+ Add Server"
- server name: Solr 3.5.0
- Service class: Solr service
- Solr host: localhost
- Solr port: 8080
- Solr path: /solr
- click Create Server

You should receive some confirmation messages:
The server was successfully created.
The Solr server could be reached (latency: # ms).
If not, ensure tomcat/solr is reachable at the url you specified and the tomcat service is running.

At this point Solr is ready to send/receive data and index content, but there is nothing to index. For this tutorial, I decided to build off of user profiles and store latitude and longitude using the geolocation field module.

$ drush dl geolocation
$ drush en geolocation

Adding some user profile fields:
- URL: /admin/config/people/accounts/fields
- First Name | field_name_first | Text
- Last Name | field_name_last | Text
- Geolocation | field_geolocation | Geolocation | Latitude/Longitude

I then added a bunch of users with latitude/longitude coordinates.
- URL: /admin/people/create
- note: I used Google Geocoding API to fetch the coordinates: http://code.google.com/apis/maps/documentation/geocoding/

Adding the search api index.
- URL: /admin/config/search/search_api
- click "+ Add index"
- Index name: People
- Item type: User
- Server: Solr 3.5.0
- click: Create Index

On the next admin page, you can select which fields to index. For this tutorial, I chose: User ID, Name, Email, URL, First Name, and Last Name. Unfortunately, at the time of writing this, the geolocation lat/lng fields are not exposed to the Entity API. I assume this is a temporary problem, and there are numerous patches in the geolocation issue queue.
@see (for example):
Property Info callback for Entity API - http://drupal.org/node/1366642
Fix for Search API not picking up the entity to index it's fields - http://drupal.org/node/1320564

I copied code directly from the issues queue, made some modifications, and created a custom module to expose the geolocation field data to the entity api module. In addition, I added a new property "lat_lon" that concatenates lat and lng together with a comma. @see: http://wiki.apache.org/solr/SpatialSearch

<?php
/**
* Implements hook_field_info_alter()
*/
function MYMODULE_field_info_alter(&$info) {
  if (isset(
$info['geolocation_latlng'])) {
   
$info['geolocation_latlng']['property_type'] = 'geolocation';
   
$info['geolocation_latlng']['property_callbacks'] = array('geolocation_property_info_callback');
  }
}

function geolocation_property_info_callback(&$info, $entity_type, $field, $instance, $field_type) {
 
$name = $field['field_name'];
 
$property = &$info[$entity_type]['bundles'][$instance['bundle']]['properties'][$name];

  $property['type'] = ($field['cardinality'] != 1) ? 'list' : 'geolocation';
 
$property['getter callback'] = 'entity_metadata_field_verbatim_get';
 
$property['setter callback'] = 'entity_metadata_field_verbatim_set';
 
$property['auto creation'] = 'geolocation_default_values';
 
$property['property info'] = geolocation_data_property_info();

  unset($property['query callback']);
}

function geolocation_default_values() {

  return array(
    'lat' => '',
   
'lng' => '',
   
'lat_sin' => '',
   
'last_name' => '',
   
'lat_cos' => '',
   
'lat_rad' => '',
   
'lat_lon' => '',
  );

}

function geolocation_data_property_info($name = NULL) {

  // Build an array of basic property information for the geolocation field.
 
$properties = array(
   
'lat' => array(
     
'label' => t('Latitude'),
    ),
   
'lng' => array(
     
'label' => t('Longitude'),
    ),
   
'lat_sin' => array(
     
'label' => t('Sine of Latitude'),
    ),
   
'lat_cos' => array(
     
'label' => t('Cosine of Latitude'),
    ),
   
'lat_rad' => array(
     
'label' => t('Radian Latitude'),
    ),
   
'lat_lon' => array(
     
'label' => t('Latitude,Longitude'),
    ),
  );

  // Add the default values for each of the address field properties.
 
foreach ($properties as $key => &$value) {
   
    switch (
$key) {
   
      case
'lat_lon':
       
$value += array(
         
'description' => !empty($name) ? t('!label of field %name', array('!label' => $value['label'], '%name' => $name)) : '',
         
'type' => 'text',
         
'getter callback' => '_MYMODULE_geolocation_entity_property_verbatim_get',
         
'setter callback' => '_MYMODULE_geolocation_entity_property_verbatim_set',
        );
        break;
   
      default:
       
$value += array(
         
'description' => !empty($name) ? t('!label of field %name', array('!label' => $value['label'], '%name' => $name)) : '',
         
'type' => 'text',
         
'getter callback' => 'entity_property_verbatim_get',
         
'setter callback' => 'entity_property_verbatim_set',
        );
        break;
   
    }

}

return $properties;
}

function _MYMODULE_geolocation_entity_property_verbatim_get($data, array $options, $name, $type, $info) {
  if (
is_array($data) && isset($data['lat']) && isset($data['lng'])) {
    return
$data['lat'] . ',' . $data['lng'];
  }
  return
'';
}

function _MYMODULE_geolocation_entity_property_verbatim_set(&$data, $name, $value, $langcode, $type, $info) {
 
// TODO
 
return;
}
?>

I added this code to a custom module, renamed function calls (as necessary), and enabled. Update the solr index to add the new fields to the index.
- URL: /admin/config/search/search_api/index/people/fields
- Expand "Add Related Fields"
- Choose Geolocation, click Add fields
The above will expose the following fields now available to the index:
- Geolocation » Latitude
- Geolocation » Longitude
- Geolocation » Sine of Latitude
- Geolocation » Cosine of Latitude
- Geolocation » Radian Latitude
- Geolocation » Latitude,Longitude
Enable "Geolocation » Latitude,Longitude" and save changes.

Index the content.
- URL: /admin/config/search/search_api/index/people/status
- Click: Index now
- note: if you had already indexed the content, you'll probably need to clear it first
In my environment, I got the following confirmation message:
Successfully indexed 7 items.

I find it to be very helpful to verify the xml response from Solr directly after making changes to the index/schema.
The following URL structure will query solr for all results and return all fields:

Ex URL: http://drupal7.vm:8080/solr/select/?q=&fl=*

A sample XML document response.


 
  http://drupal7.vm/user/3
  people-3
  people
  3
  3
 
    nashua
    nashua@example.com
    nashua
    nashua
    42.933692,-72.278141
 

  3
 
  http://drupal7.vm/user/3
 
    42.933692,-72.278141
 

 
    nashua
 

 
    nashua
 

 
    nashua@example.com
 

 
    nashua
 

Take note the field name in the following XML, it is used in the next file edit.


  42.933692,-72.278141

Update the solr schema.xml configuration and add the geospatial fieldType and field data.

# Edit file: /var/lib/tomcat6/solr/conf/schema.xml
# Just prior to the closing "" tag, I inserted: (around line 287)
   
   
   

# And, just after the opening "" tag, I inserted:
   
   

Restart Tomcat

$ /etc/init.d/tomcat6 restart

Since the schema and solr data types have been updated, the content will have to be re-indexed.
- URL: /admin/config/search/search_api/index/people/status
- click: Clear index
- click: Index now

Returning to the solr query above will now show updated xml: (note: no longer an array)

42.933692,-72.278141

Verify the native solr geospatial searching is working using the following query syntax:
URL: http://drupal7.vm:8080/solr/select/?q=&fl=*&fq={!geofilt sfield=t_field_geolocation:lat_lon pt=42.933692,-72.278141 d=100}
By putting a distance parameter of 100 (kilometers) and Nashua NH coordinates, I get 2 results: Nashua and Portsmouth, awesome.

Create a solr integrated view.
- URL: /admin/structure/views/add
- View name: People
- Show: People
- Create a Page [checked]
- Path: people
- Continue & edit
Note: at this point, you have full reign over view configuration. For this tutorial, I set the format to Grid, and added some fields:
- Geolocation: Latitude,Longitude (indexed)
- Indexed User: Email
- Indexed User: First Name
- Indexed User: Last Name
- Indexed User: Name
Save the view when edits are complete.

Browsing to the view will show something like this:
Ex URL: http://drupal7.vm.people
People View

The next chunk of custom code modifies the solr query executed and adds geospatial filtering.
@see: hook_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query)

<?php
function MYMODULE_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {

 

$lat = 42.933692;
 
$lng = -72.278141;
 
$distance = 100;

 

$call_args['params']['fq'][] = "{!geofilt sfield=t_field_geolocation:lat_lon pt={$lat},{$lng} d={$distance}}";

}

?>

The above code will limit the view's results using the hardcoded coordinates.
People View 2

Clearly, it works but there are loose ends to tie..
- automatically fetch a user's coordinates to store in the geolocation field
- add a search form to the people view page to allow the user to search for a location (instead of hard coded coordinates, blah)
- translate the user's location search input to coordinates using an API

Hopefully, I can find more time to elaborate on this tutorial in the near future! Cheers.