Main Types
DataValueTables.AbstractDataValueTable
DataValueTables.DataValueTable
DataValueTables.SubDataValueTable
An abstract type for which all concrete types expose a database-like interface.
Common methods
An AbstractDataValueTable is a two-dimensional table with Symbols for column names. An AbstractDataValueTable is also similar to an Associative type in that it allows indexing by a key (the columns).
The following are normally implemented for AbstractDataValueTables:
describe
: summarize columnsdump
: show structurehcat
: horizontal concatenationvcat
: vertical concatenationnames
: columns namesnames!
: set columns namesrename!
: rename columns names based on keyword argumentseltypes
:eltype
of each columnlength
: number of columnssize
: (nrows, ncols)head
: firstn
rowstail
: lastn
rowsconvert
: convert to an arrayDataValueArray
: convert to a DataValueArraycompletecases
: boolean vector of complete cases (rows with no nulls)dropna
: remove rows with null valuesdropna!
: remove rows with null values in-placenonunique
: indexes of duplicate rowsunique!
: remove duplicate rowssimilar
: a DataValueTable with similar columns asd
Indexing
Table columns are accessed (getindex
) by a single index that can be a symbol identifier, an integer, or a vector of each. If a single column is selected, just the column object is returned. If multiple columns are selected, some AbstractDataValueTable is returned.
d[:colA]
d[3]
d[[:colA, :colB]]
d[[1:3; 5]]
Rows and columns can be indexed like a Matrix
with the added feature of indexing columns by name.
d[1:3, :colA]
d[3,3]
d[3,:]
d[3,[:colA, :colB]]
d[:, [:colA, :colB]]
d[[1:3; 5], :]
setindex
works similarly.
DataValueTables.DataValueTable
— Type.An AbstractDataValueTable that stores a set of named columns
The columns are normally AbstractVectors stored in memory, particularly a Vector, DataValueVector, or CategoricalVector.
Constructors
DataValueTable(columns::Vector{Any}, names::Vector{Symbol})
DataValueTable(kwargs...)
DataValueTable() # an empty DataValueTable
DataValueTable(t::Type, nrows::Integer, ncols::Integer) # an empty DataValueTable of arbitrary size
DataValueTable(column_eltypes::Vector, names::Vector, nrows::Integer)
DataValueTable(ds::Vector{Associative})
Arguments
columns
: a Vector{Any} with each column as contentsnames
: the column nameskwargs
: the key gives the column names, and the value is the column contentst
: elemental type of all columnsnrows
,ncols
: number of rows and columnscolumn_eltypes
: elemental type of each columnds
: a vector of Associatives
Each column in columns
should be the same length.
Notes
Most of the default constructors convert columns to DataValueArray
. The base constructor, DataValueTable(columns::Vector{Any}, names::Vector{Symbol})
does not convert to DataValueArray
.
A DataValueTable
is a lightweight object. As long as columns are not manipulated, creation of a DataValueTable from existing AbstractVectors is inexpensive. For example, indexing on columns is inexpensive, but indexing by rows is expensive because copies are made of each column.
Because column types can vary, a DataValueTable is not type stable. For performance-critical code, do not index into a DataValueTable inside of loops.
Examples
dt = DataValueTable()
v = ["x","y","z"][rand(1:3, 10)]
dt1 = DataValueTable(Any[collect(1:10), v, rand(10)], [:A, :B, :C]) # columns are Arrays
dt2 = DataValueTable(A = 1:10, B = v, C = rand(10)) # columns are DataValueArrays
dump(dt1)
dump(dt2)
describe(dt2)
DataValueTables.head(dt1)
dt1[:A] + dt2[:C]
dt1[1:4, 1:2]
dt1[[:A,:C]]
dt1[1:2, [:A,:C]]
dt1[:, [:A,:C]]
dt1[:, [1,3]]
dt1[1:4, :]
dt1[1:4, :C]
dt1[1:4, :C] = 40. * dt1[1:4, :C]
[dt1; dt2] # vcat
[dt1 dt2] # hcat
size(dt1)
A view of row subsets of an AbstractDataValueTable
A SubDataValueTable
is meant to be constructed with view
. A SubDataValueTable is used frequently in split/apply sorts of operations.
view(d::AbstractDataValueTable, rows)
Arguments
d
: an AbstractDataValueTablerows
: any indexing type for rows, typically an Int, AbstractVector{Int}, AbstractVector{Bool}, or a Range
Notes
A SubDataValueTable
is an AbstractDataValueTable, so expect that most DataValueTable functions should work. Such methods include describe
, dump
, nrow
, size
, by
, stack
, and join
. Indexing is just like a DataValueTable; copies are returned.
To subset along columns, use standard column indexing as that creates a view to the columns by default. To subset along rows and columns, use column-based indexing with view
.
Examples
dt = DataValueTable(a = repeat([1, 2, 3, 4], outer=[2]),
b = repeat([2, 1], outer=[4]),
c = randn(8))
sdt1 = view(dt, 1:6)
sdt2 = view(dt, dt[:a] .> 1)
sdt3 = view(dt[[1,3]], dt[:a] .> 1) # row and column subsetting
sdt4 = groupby(dt, :a)[1] # indexing a GroupedDataValueTable returns a SubDataValueTable
sdt5 = view(sdt1, 1:3)
sdt1[:,[:a,:b]]