Why cMQL

  • less code up to 3x
  • clear structure
  • clear notation

For the 3 basic reasons see what is cMQL and the examples on cMQL-Play

The other 3 reasons are that cMQL is

  • good choice for querying/data processing
  • like MQL with (), both are functional homoiconic
  • simple,provides easy interop and portable queries because it uses Clojure features

Here we will focus on those 3 reasons

Quering,Data processing

Data software

  • MongoDB choosed MQL , a functional homoiconic language
    • Commands
    • Query operators
      MQL is not a general programming language, it is missing feautures, but looks like functional programming.
  • Apache Hadoop is based on map reduce that comes from functional programming
  • Apache Spark is made in Scala
  • Apache Kafka is made in Java,Scala
  • RethinkDB chose ReQL to be functional

The reason is because data processing using pipelines is a form of functional programming.
Also functional programming avoids mutable state,making parallel data processing easier.

Functional programming

Functional programming

  • we use a series of nested calls,instead of making a series of assignments (we use constants,the result of a function doesn't update the state of a variable,it goes to the next function as argument)
  • we have independed functions that we use as arguments, or we return functions

*we can have mutable state and we can have names to hold a value,but the above are the main way of functional programming.

Data processing

Data processing can be seen as a form of functional programming

  • we use nested calls for pipelines (data goes from one function to the next)
  • we have independed functions that we use as arguments to define the trasformations (functions as arguments,for example map/filter/reduce etc)

On data processing,we focus on returning data,and less on returning functions.

Pipelines

In data processing commonly use 2 types of pipelines

  • 1 data source (like aggregation stages)
    (function takes as input the output of 1 function only)
  • many data sources (like aggregation operators)
    (function takes as input the output of many functions,arguments that are function calls)

Pipeline programming is a form of functional programming.
The output of a function become input to the next function.

1 data source

One data source like aggregation stages in MQL Each function, has max 1 function call as argument,and possible more arguments that are not function calls.

(f3 (f2 (f1 arg1)
arg2
arg3)
arg4)

Because the tree is almost like a line(extra arguments are just leaves,not trees), we can use a "trick" and make the nested calls to look like a line.

data -> f1 -> f2 ->f3 ....-> final_data
(-> (f1 data) f2 f3 ...) ;;clojure
tranform(f1).transform(f2).transform(f3)... ;;Java8+,we need lambdas

This allows us to avoid the nested calls,for the simple case that we have max 1 function call as argument.

Many data sources

Many data sources like aggregate operators.
A function that can take inputs that are results of other function calls.

The above trick cannot work here,or even if we try to make a tree look like a line, it would so hard to understand what it does(we would have to imagine the tree structure).

Nested syntax

When we nest code,we need the programming language syntax,to make it readable for us.

How we nest things?
We start with a symbol that symbolize the nesting level,and inside it we put the code

Example

{
st1
st2
{
st3
st4
{
st5
st6
}
}
}
where => read { special symbol (find nested level)
what => read inside what it does (find what the code does)
We also need to check more outer or more inner levels to get the whole picture

Query example

we know that "takis" employer of dept 1,has a new child {:name "elpida" :age 2} and we want to add it.

(insert :testdb.testcoll
{:dept 1
:employers [{:id 1
:name "takis"
:children [{:name "nikos"
:age 5}
{:name "helen"
:age 8}]}]})

cMQL has an operator to make easy read/update nested documents/arrays easy

(update- :testdb.testcoll
(uq (= :dept 1)
(replace-root (assoc-in :ROOT. ["employers"
{:icond (= :v.name. "takis")}
"children"
{:icond (count :a.)}]
{:name "elpida"
:age 2}))))
It means
- get key employers
- get index X , where the element(:v.) in position X has (:v.name. "takis")
- get key "children"
- get index X , where it equals with the end of the array
(we want to add in end of unknow size array)
They are very easy to use, for more see Collections/Nested

MQL

MQL is nested language(aggregate operators can take many arguments,we cant just put them in line like stages), data processing is nested programming, so we will be nesting code anyways, and this is where cMQL can help.

Clojure

Reasons for selecting Clojure for cMQL

The above was about why functional programming helps on data processing
The bellow are how Clojure's features and similarities with MQL helped on making cMQL

Functional and homoiconic

For the reasons above, functional programming supports data processing.

Homoiconic

  • means that the code is written in language data structures.
  • homoiconic languages have very simple syntax (code in clojure is lists and vectors,code in MQL is maps and arrays)
  • we can do meta programming with them,process code,to generate code
    (see bellow macros)
  • MQL is homoiconic also,so it looked natural fit (see below)

MQL similarities

Both are functional and homoiconic (code is written in language data structures)
Both are also made to be easy to use them with other languages.

The programming style is exactly the same, and Clojure experience can be used in MQL, or the opposite, if someone likes MQL its very likely to like Clojure also.

MQL looks like Clojure in {} MQL uses {f-name ...} Clojure uses (f-name ...)

MQL

{'$setIntersection' : ['$w' ,'$z']}

cMQL

(intersection :w :z)

cMQL wraps MQL without changing the programming style.

JSON like literals

In Clojure we have clojure-maps,we can use them to represent data,that we will insert.

Clojure has many data structures like lists/vectors/sets/maps.

We can write cMQL queries but whenever we want we can use raw MQL also inside the cMQL queries

For example we could write even very mixed code like the bellow,stage operator in MQL,reference in cMQL, one aggregate operator in cMQL the other in MQL.

(q acoll
{"$addfields" {:a (+ 1 {"$add" [2 3]})}})

Macros

Clojure has powerful macros,as a homoiconic language we are allowed to write code that will generate code.Macros run before running the clojure code,and they produce the final source code.

Macros are very important on making "new languages".
This way the user can write much less code,or can write even invalid Clojure code, that the macro will process to produce the valid Clojure code.

With macros,we are free to make the new language as we wanted to be. For example macros allowed as to use language core names like reduce/map/let etc.

In cMQL we do

(map (fn [:m.] (* (+ :m. 1) 2)) :myarray)

Not

(m-map (m-fn [:m.] (mul (add :m. 1)) :myarray))

Keywords

In clojure we have keywords. Here we use them for fields and variables.

:myfield = field reference :myvar. = a var (has a . in the end or in start like :.myvar)

This allowed us to not use the "$$" "$" that is hard to read.

Simple and practical

It allowed us to remove MQL verbosity,without inventing a new language.
The simple Clojure's way was used.

(reduce (fn [sum n] (+ sum n)) [] myarray) ; clojure
(reduce (fn [:sum. :n.] (+ :sum. :n.)) [] :myarray) ; cMQL
{"$reduce" {"input" "$myarray", ; MQL
"initialValue" [],
"in" {"$let" {"vars" {"sum" "$$value", "n" "$$this"},
"in" {"$add" ["$$sum" "$$n"]}}}}}

Run in many Drivers

Clojure is made to be a hosted language,it doesn't just runs in JVM for example, Clojure is made for the JVM,and Clojurescript is made for Javascript.

We also have many Clojure's,cMQL works only for java/js for now.
This allows us easy interop,and portable cMQL queries between the drivers cMQL supports .

Comparison

cMQL uses clojure features and the similarities with MQL to make the query builder simple.

If java or javascript was used to make the query builder

  • not functional, not natural for querying and data processing
  • not readable nesting of function calls
  • not like MQL different syntax, programming model
  • no JSON literals
  • no macros,to make the query builder less verbose
  • no keywords,to use for the fields
  • queries would run only in java or only in javascript

The result would be a verbose,not portable query builder,not suitable for data processing
That would be also harder to make.

Query builders that we have can be very incomplete also.

Queries bellow are so simple and so small like 5 lines,cMQL goes ~2x even in simple queries,
but cMQL is not just a way to make easy queries very easy,for big or complex queries, it is a way to avoid "code explosion" see also example3 example4

Java

  • Java Mongodb official driver
    doesn't cover most aggregate operators(only stages/filters(some query operators)/accumulators)
    We have like 100+ aggregate operators more that we have to write MQL to use them
    (there is no let/map/reduce/filter etc)
  • java is not like MQL making a java query builder is harder

Small example in Java only with covered operators (if not covered we could go 4x+)

cMQL = 79 characters
Java Query = 188 characters (~2x)

(q zips
(= :state "TX")
(group :city {:totalPop :pop}
[:!_id :totalPop]
(sort :!totalPop)))

Mongoose

  • Mongoose supports JSON like literals,and and raw MQL is generaly used.
  • Query builder offers even less than the Java one.
  • The result is a mix of MQL with the query builder.

It has the problems of MQL for more see the examples

Spring Data MongoDB

Spring data MongoDB query builder offers aggregator operators.

cMQL = 92 characters
Spring = 180 characters (2x)

(q coll
[:firstname
:lastname
{:created (map (fn [:u.] (str :u.firstname. " " :u.lastname.)) :created)}])

Even in so small example we can see many of the problems described above

  • map is not clear
  • the function that we will use is not clear the body and the arguments
  • the nesting is not clear
  • we have extra words like and/as/valueOf/concatValueOf/andApply,
  • we also have a mixed notation of references and variables with "$$","$"

And its not Spring query builder problem,the problem is that we dont have macros, to make a nice query builder, and we try to wrap MQL(functional homoiconic) with a procedural language,and then write functional code with it.