Is it possible to combine the worlds amazing prototyping language (aka Python) with JavaScript?

Yes, it is. Welcome to PyV8!


Prerequisites

So, first we some libraries and modules:

  1. Boost with Python Support

    • On Ubuntu/Debian you just do apt-get install libboost-python-dev, for Fedora/RHEL use your package manager.
    • On MAC OSX:

      • When you are on Homebrew do this:

      brew install boost --with python

  2. PyV8 Module

    (You need Subversion installed for this)

       mkdir pyv8
    cd pyv8
    svn co http://pyv8.googlecode.com/svn/trunk/
    cd trunk
    

    When you are on Mac OS X you need to add this first:

    export CXXFLAGS='-std=c++11 -stdlib=libc++ -mmacosx-version-min=10.8'
    export LDFLAGS=-lc++
    

    Now just do this:

    python ./setup.py install

    And wait !

    (Some words of advise: When you are installing boost from your OS, make sure you are using the python version which boost was compiled with)

  3. Luck ;)

    Means, if this doesn't work, you have to ask Google.

Now, how does it work?

Easy, easy, my friend.

The question is, why should we use JavaScript inside a Python tool?

Well, while doing some crazy stuff with our ElasticSearch cluster, I wrote a small python script to do some nifty parsing and correlation. After not even 30 mins I had a commandline tool, which read in a YAML file, with ES Queries written in YAML Format, and an automated way to query more than one ES cluster.

So, let's say you have a YAML like this:

title:  
  name: "Example YAML Query File"
esq:  
  hosts:
    es_cluster_1:
      fqdn: "localhost"
      port: 9200
    es_cluster_2:
      fqdn: "localhost"
      port: 10200
    es_cluster3:
      fqdn: "localhost"
      port: 11200_
indices:  
  - index:
      id: "all"
      name: "_all"
      all: true
  - index:
      id: "events_for_three_days"
      name: "[events-]YYYY-MM-DD"
      type: "failover"
      days_from_today: 3
  - index:
      id: "events_from_to"
      name: "[events-]YYYY-MM-DD"
      type: "failover"
      interval:
        from: "2014-08-01"
        to: "2014-08-04"
query:  
  on_index:
    all:
      filtered:
        filter:
          term:
            code: "INFO"
    events_for_three_days_:
      filtered:
        filter:
          term:
            code: "ERROR"
    events_from_to:
      filtered:
        filter:
          term:
            code: "DEBUG"

No, this is not really what we are doing :) But I think you get the idea.

Now, in this example, we have 3 different ElasticSearch Clusters to search in, and all three have different data, but all are sharing the same Event format.
So, my idea was to generate reports of the requested data, but eventually for a single ES Cluster, or correlated over all three.
I wanted to have the functionality inside the YAML file, so everybody who is writing such a YAML file can also add some processing code.
Well, the result set of an ES search query is a JSON blob, and thanks to elasticsearch.py it will be converted to a Python dictionary.

Huu...so, why don't you use python code inside YAML and eval it inside your Python Script?

Well, when you ever wrote Front/Backend Web Apps, you know it's pretty difficult to write Frontend Python Scripts which are running inside your browser. So, JavaScript here for the rescue.
And everybody knows how easy it is, to deal with JSON object structures inside JavaScript. So, why don't we use this knowledge and invite users who are not familiar with Python, to participate?

Now, think about an idea like this:

title:  
  name: "Example YAML Query File"
esq:  
  hosts:
    es_cluster_1:
      fqdn: "localhost"
      port: 9200
    es_cluster_2:
      fqdn: "localhost"
      port: 10200
    es_cluster3:
      fqdn: "localhost"
      port: 11200_
indices:  
  - index:
      id: "all"
      name: "_all"
      all: true
  - index:
      id: "events_for_three_days"
      name: "[events-]YYYY-MM-DD"
      type: "failover"
      days_from_today: 3
  - index:
      id: "events_from_to"
      name: "[events-]YYYY-MM-DD"
      type: "failover"
      interval:
        from: "2014-08-01"
        to: "2014-08-04"
query:  
  on_index:
    all:
      filtered:
        filter:
          term:
            code: "INFO"
    events_for_three_days_:
      filtered:
        filter:
          term:
            code: "ERROR"
    events_from_to:
      filtered:
        filter:
          term:
            code: "DEBUG"
processing:  
    for:
        report1: |
            function find_in_collection(collection, search_entry) {
                for (entry in collection) {
                    if (search_entry[entry]['msg'] == collection[entry]['msg']) {
                        return collection[entry];
                    }
                }
                return null;
            } 
            function correlate_cluster_1_and_cluster_2(collections) {
                collection_cluster_1 = collections["cluster_1"]["hits"]["hits"];
                collection_cluster_2 = collections["cluster_2"]["hits"]["hits"];
                similar_entries = [];
                for (entry in collection_cluster_1) {
                    similar_entry = null;
                    similar_entry = find_in_collection(collection_cluster_2, collection_cluster_1[entry]);
                    if (similar_entry != null) {
                        similar_entries.push(similar_entry);
                    }
                }
                result = {'similar_entries': similar_entries};
                return(result)
            }
            var result = correlate_cluster_1_and_cluster_2(collections);
            // this will return the data to the python method result 
            result
output:  
    reports;
        report1: |
            {% for similar_entry in similiar_entries %}
            {{ similiar_entry.msg }}
            {% endfor %}

(This is not my actual code, I just scribbled it down, so don't lynch me if this fails)

So, actually, I am passing a python dict with all the query resulsets from the ES clusters (defined at the top of the YAML file) towards a PyV8 Context Object, can access those collections inside my JavaScript and return a JavaScript HASH / Object.
In the end, after JavaScript Processing, there could be a Jinja Template inside the YAML file, and we can pass the JavaScript results into this template, for printing a nice report.
There are many things you can do with this.

So, let's see it in python code:

# -*- coding: utf-8 -*-
# This will be a short form of this,
# so don't expect that this code will do the reading and validation
# of the YAML file

from elasticsearch import Elasticsearch  
import PyV8  
from jinja2 import Template

class JSCollections(PyV8.JSClass):  
    def __init__(self, *args, **kwargs):
        super(JSCollections, self).__init__()
        self.collections = {}
        if 'collections' in kwargs:
            self.collections=kwargs['collections']

    def write(self, val):
        print(val)

if __name__ == '__main__':  
    es_cluster_1 = Elasticsearch({"host":"localhost", port: 9200})
    es_cluster_2 = Elasticsearch({"host":"localhost", port: 10200})
    collections = {}
    collections['cluster_1] = es_cluster_1.search(index="_all", body={"query": { "filtered": {"filter": {"term": {"code": "DEBUG"}}}}}, size=100)
    collections['cluster_2] = es_cluster_2.search(index="_all", body={"query": { "filtered": {"filter": {"term": {"code": "DEBUG"}}}}}, size=100)
    js_ctx = PyV8.JSContext(JSCollection(collections=collections))
    js_ctx.enter()
    #
    # here comes the javascript code
    #
    process_result = js_ctx.eval("""
            function find_in_collection(collection, search_entry) {
                for (entry in collection) {
                    if (search_entry[entry]['msg'] == collection[entry]['msg']) {
                        return collection[entry];
                    }
                }
                return null;
            } 
            function correlate_cluster_1_and_cluster_2(collections) {
                collection_cluster_1 = collections["cluster_1"]["hits"]["hits"];
                collection_cluster_2 = collections["cluster_2"]["hits"]["hits"];
                similar_entries = [];
                for (entry in collection_cluster_1) {
                    similar_entry = null;
                    similar_entry = find_in_collection(collection_cluster_2, collection_cluster_1[entry]);
                    if (similar_entry != null) {
                        similar_entries.push(similar_entry);
                    }
                }
                result = {'similar_entries': similar_entries};
                return(result)
            }
            var result = correlate_cluster_1_and_cluster_2(collections);
            // this will return the data to the python method result 
            result
    """)
    # back to python
    print("RAW Process Result".format(process_result))
    # create a jinja2 template and print it with the results from javascript processing
    template = Template("""
        {% for similar_entry in similiar_entries %}
        {{ similiar_entry.msg }}
        {% endfor %}
    """)
    print(template.render(process_result))

Again, just wrote it down, not the actual code, so dunno if it really works.

But still, this is pretty simple.

You can even use JavaScript Events, or JS debuggers, or create your own Server Side Browsers. You can find those examples in the demos directory of the PyV8 Source Tree.

So, this was all a 30 mins Prove of Concept, and last night I refactored the code and this morning, I thought, well, let's write a real library for this. So, eventually there is some code on github over the weekend. I'll let you know.

Oh, before I forget, the idea of writing all this in a YAML file came from work with Junipers JunOS PyEZ Library, which has a similar way. But they are using the YAML file as description for autogenerated Python Classes. Very Nifty.