Query.jl v0.9.x released

20 December 2017

I just released Query.jl v0.9.0. The new version adds the @take and @drop standalone query operators and brings pretty printing to uncollected queries.

Pretty printing

In previous versions queries displayed a really awful mess of internal data when they were displayed in the REPL. In practice one always had to collect a query into something like a DataFrame to get a nice view of the query result. The new version changes that and provides a nice output for any query, even an uncollected one. Here is an example:

julia> using FileIO, Query, CSVFiles

julia> filename = "https://gist.githubusercontent.com/davidanthoff/bebfd24c1a3f32f576eb61bee77f5944/raw/dd9233ad860037a2155f3a9ca3c37eb2d5572573/testdata2.csv";

julia> load(filename) |> @map({_.Year, _.Cause_Name})
15028x2 query result
Year │ Cause_Name
─────┼───────────────────────
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
1999 │ Unintentional Injuries
... with 15018 more rows

The pretty printing should work for the values returned from any of the query operators. The output format is heavily inspired by R’s tibbles.

I hope this will make interactive work much more pleasant because it should be easier to build up more complicated queries step by step, while periodically running a query to check intermediate results.

I also plan to add this to the whole tabular file IO of the iterable tables ecosystem at a later date (e.g. CSVFiles.jl, FeatherFiles.jl, ExcelFiles.jl, StatFiles.jl etc.).

The @take and @drop query commands

Those are fairly straightforward: both of these filter elements out of a sequence. @take limits the number of elements to some upper maximum, and @drop skips a number of elements. Here is an example of how one can use these:

using FileIO, Query, CSVFiles

filename = "https://gist.githubusercontent.com/davidanthoff/bebfd24c1a3f32f576eb61bee77f5944/raw/dd9233ad860037a2155f3a9ca3c37eb2d5572573/testdata2.csv"

load(filename) |> 
    @filter(_.Cause_Name!="All Causes" && !isnull(_.Age_adjusted_Death_Rate)) |> 
    @groupby(_.Cause_Name) |> 
    @map({cause=_.key, death_rate=sum(_..Age_adjusted_Death_Rate)}) |>
    @orderby_descending(_.death_rate) |> 
    @drop(2) |> 
    @take(3) |>
    save("output.feather")

This example showcases a whole range of features, including the use of the @drop and @take operations. The official documentation for these two new operators is in the “Experimental Features” section in the Query.jl documentation.

Any feedback on these new features (and old ones) is most welcome, and of course any help with the overall package would also be fantastic!

This post is being discussed here.