Utilities

Utilities

Return element types of columns

eltypes(dt::AbstractDataValueTable)

Arguments

  • dt : the AbstractDataValueTable

Result

  • ::Vector{Type} : the element type of each column

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
eltypes(dt)
source
DataValueTables.headFunction.

Show the first or last part of an AbstractDataValueTable

head(dt::AbstractDataValueTable, r::Int = 6)
tail(dt::AbstractDataValueTable, r::Int = 6)

Arguments

  • dt : the AbstractDataValueTable

  • r : the number of rows to show

Result

  • ::AbstractDataValueTable : the first or last part of dt

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
head(dt)
tail(dt)
source

Indexes of complete cases (rows without null values)

completecases(dt::AbstractDataValueTable)

Arguments

  • dt : the AbstractDataValueTable

Result

  • ::Vector{Bool} : indexes of complete cases

See also dropna and dropna!.

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt[[1,4,5], :x] = DataValue()
dt[[9,10], :y] = DataValue()
completecases(dt)
source
StatsBase.describeFunction.

Summarize the columns of an AbstractDataValueTable

describe(dt::AbstractDataValueTable)
describe(io, dt::AbstractDataValueTable)

Arguments

  • dt : the AbstractDataValueTable

  • io : optional output descriptor

Result

  • nothing

Details

If the column's base type derives from Number, compute the minimum, first quantile, median, mean, third quantile, and maximum. Nulls are filtered and reported separately.

For boolean columns, report trues, falses, and nulls.

For other types, show column characteristics and number of nulls.

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
describe(dt)
source
DataValues.dropnaFunction.

Remove rows with null values.

dropna(dt::AbstractDataValueTable)

Arguments

  • dt : the AbstractDataValueTable

Result

  • ::AbstractDataValueTable : the updated copy

See also completecases and dropna!.

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt[[1,4,5], :x] = DataValue()
dt[[9,10], :y] = DataValue()
dropna(dt)
source

dropna(X::AbstractVector)

Return a vector containing only the non-missing entries of X, unwrapping DataValue entries. A copy is always returned, even when X does not contain any missing values.

source
DataValues.dropna!Function.

Remove rows with null values in-place.

dropna!(dt::AbstractDataValueTable)

Arguments

  • dt : the AbstractDataValueTable

Result

  • ::AbstractDataValueTable : the updated version

See also dropna and completecases.

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt[[1,4,5], :x] = DataValue()
dt[[9,10], :y] = DataValue()
dropna!(dt)
source

dropna!(X::AbstractVector)

Remove missing entries of X in-place and return a Vector view of the unwrapped DataValue entries. If no missing values are present, this is a no-op and X is returned.

source

dropna!(X::DataValueVector)

Remove missing entries of X in-place and return a Vector view of the unwrapped DataValue entries.

source
Base.dumpFunction.

Show the structure of an AbstractDataValueTable, in a tree-like format

dump(dt::AbstractDataValueTable, n::Int = 5)
dump(io::IO, dt::AbstractDataValueTable, n::Int = 5)

Arguments

  • dt : the AbstractDataValueTable

  • n : the number of levels to show

  • io : optional output descriptor

Result

  • nothing

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dump(dt)
source

Set column names

names!(dt::AbstractDataValueTable, vals)

Arguments

  • dt : the AbstractDataValueTable

  • vals : column names, normally a Vector{Symbol} the same length as the number of columns in dt

  • allow_duplicates : if false (the default), an error will be raised if duplicate names are found; if true, duplicate names will be suffixed with _i (i starting at 1 for the first duplicate).

Result

  • ::AbstractDataValueTable : the updated result

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
names!(dt, [:a, :b, :c])
names!(dt, [:a, :b, :a])  # throws ArgumentError
names!(dt, [:a, :b, :a], allow_duplicates=true)  # renames second :a to :a_1
source

Indexes of duplicate rows (a row that is a duplicate of a prior row)

nonunique(dt::AbstractDataValueTable)
nonunique(dt::AbstractDataValueTable, cols)

Arguments

  • dt : the AbstractDataValueTable

  • cols : a column indicator (Symbol, Int, Vector{Symbol}, etc.) specifying the column(s) to compare

Result

  • ::Vector{Bool} : indicates whether the row is a duplicate of some prior row

See also unique and unique!.

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt = vcat(dt, dt)
nonunique(dt)
nonunique(dt, 1)
source

Rename columns

rename!(dt::AbstractDataValueTable, from::Symbol, to::Symbol)
rename!(dt::AbstractDataValueTable, d::Associative)
rename!(f::Function, dt::AbstractDataValueTable)
rename(dt::AbstractDataValueTable, from::Symbol, to::Symbol)
rename(f::Function, dt::AbstractDataValueTable)

Arguments

  • dt : the AbstractDataValueTable

  • d : an Associative type that maps the original name to a new name

  • f : a function that has the old column name (a symbol) as input and new column name (a symbol) as output

Result

  • ::AbstractDataValueTable : the updated result

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
rename(x -> @Symbol(uppercase(string(x))), dt)
rename(dt, Dict(:i=>:A, :x=>:X))
rename(dt, :y, :Y)
rename!(dt, Dict(:i=>:A, :x=>:X))
source

Rename columns

rename!(dt::AbstractDataValueTable, from::Symbol, to::Symbol)
rename!(dt::AbstractDataValueTable, d::Associative)
rename!(f::Function, dt::AbstractDataValueTable)
rename(dt::AbstractDataValueTable, from::Symbol, to::Symbol)
rename(f::Function, dt::AbstractDataValueTable)

Arguments

  • dt : the AbstractDataValueTable

  • d : an Associative type that maps the original name to a new name

  • f : a function that has the old column name (a symbol) as input and new column name (a symbol) as output

Result

  • ::AbstractDataValueTable : the updated result

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
rename(x -> @Symbol(uppercase(string(x))), dt)
rename(dt, Dict(:i=>:A, :x=>:X))
rename(dt, :y, :Y)
rename!(dt, Dict(:i=>:A, :x=>:X))
source
DataValueTables.tailFunction.

Show the first or last part of an AbstractDataValueTable

head(dt::AbstractDataValueTable, r::Int = 6)
tail(dt::AbstractDataValueTable, r::Int = 6)

Arguments

  • dt : the AbstractDataValueTable

  • r : the number of rows to show

Result

  • ::AbstractDataValueTable : the first or last part of dt

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
head(dt)
tail(dt)
source
Base.uniqueFunction.

Delete duplicate rows

unique(dt::AbstractDataValueTable)
unique(dt::AbstractDataValueTable, cols)
unique!(dt::AbstractDataValueTable)
unique!(dt::AbstractDataValueTable, cols)

Arguments

  • dt : the AbstractDataValueTable

  • cols : column indicator (Symbol, Int, Vector{Symbol}, etc.)

specifying the column(s) to compare.

Result

  • ::AbstractDataValueTable : the updated version of dt with unique rows.

When cols is specified, the return DataValueTable contains complete rows, retaining in each case the first instance for which dt[cols] is unique.

See also nonunique.

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt = vcat(dt, dt)
unique(dt)   # doesn't modify dt
unique(dt, 1)
unique!(dt)  # modifies dt
source
unique(A::CategoricalArray)
unique(A::DataValueCategoricalArray)

Return levels which appear in A, in the same order as levels (and not in their order of appearance). This function is significantly slower than levels since it needs to check whether levels are used or not.

source

Delete duplicate rows

unique(dt::AbstractDataValueTable)
unique(dt::AbstractDataValueTable, cols)
unique!(dt::AbstractDataValueTable)
unique!(dt::AbstractDataValueTable, cols)

Arguments

  • dt : the AbstractDataValueTable

  • cols : column indicator (Symbol, Int, Vector{Symbol}, etc.)

specifying the column(s) to compare.

Result

  • ::AbstractDataValueTable : the updated version of dt with unique rows.

When cols is specified, the return DataValueTable contains complete rows, retaining in each case the first instance for which dt[cols] is unique.

See also nonunique.

Examples

dt = DataValueTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt = vcat(dt, dt)
unique(dt)   # doesn't modify dt
unique(dt, 1)
unique!(dt)  # modifies dt
source