TSne.jl
TSne.jl copied to clipboard
Julia port of L.J.P. van der Maaten and G.E. Hintons T-SNE visualisation technique.
t-SNE (t-Stochastic Neighbor Embedding)
Julia implementation of L.J.P. van der Maaten and G.E. Hintons t-SNE visualisation technique.
The scripts in the examples
folder require Plots
, MLDatasets
and RDatasets
Julia packages.
Installation
julia> Pkg.add("TSne")
Basic API usage
tsne(X, ndim, reduce_dims, max_iter, perplexit; [keyword arguments])
Apply t-SNE (t-Distributed Stochastic Neighbor Embedding) to X
,
i.e. embed its points (rows) into ndims
dimensions preserving close neighbours.
Returns the points×ndims
matrix of calculated embedded coordinates.
-
X
: AbstractMatrix or AbstractVector. IfX
is a matrix, then rows are observations and columns are features. -
ndims
: Dimension of the embedded space. -
reduce_dims
the number of the first dimensions ofX
PCA to use for t-SNE, if 0, all available dimension are used -
max_iter
: Maximum number of iterations for the optimization - `perplexity': The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results
Optional Arguments
-
distance
iftrue
, specifies thatX
is a distance matrix, if of typeFunction
orDistances.SemiMetric
, specifies the function to use for calculating the distances between the rows (or elements, ifX
is a vector) ofX
-
pca_init
whether to use the firstndims
ofX
PCA as the initial t-SNE layout, iffalse
(the default), the method is initialized with the random layout -
max_iter
how many iterations of t-SNE to do -
perplexity
the number of "effective neighbours" of a datapoint, typical values are from 5 to 50, the default is 30 -
verbose
output informational and diagnostic messages -
progress
display progress meter during t-SNE optimization -
min_gain
,eta
,initial_momentum
,final_momentum
,momentum_switch_iter
,stop_cheat_iter
,cheat_scale
low-level parameters of t-SNE optimization -
extended_output
iftrue
, returns a tuple of embedded coordinates matrix, point perplexities and final Kullback-Leibler divergence
Example usage
using TSne, Statistics, MLDatasets
rescale(A; dims=1) = (A .- mean(A, dims=dims)) ./ max.(std(A, dims=dims), eps())
alldata, allabels = MNIST.traindata(Float64);
data = reshape(permutedims(alldata[:, :, 1:2500], (3, 1, 2)),
2500, size(alldata, 1)*size(alldata, 2));
# Normalize the data, this should be done if there are large scale differences in the dataset
X = rescale(data, dims=1);
Y = tsne(X, 2, 50, 1000, 20.0);
using Plots
theplot = scatter(Y[:,1], Y[:,2], marker=(2,2,:auto,stroke(0)), color=Int.(allabels[1:size(Y,1)]))
Plots.pdf(theplot, "myplot.pdf")
Command line usage
julia demo-csv.jl haveheader --labelcol=5 iris-headers.csv
Creates myplot.pdf
with t-SNE result visualized using Gadfly.jl
.