# Fuzzy Cluster - Java

Download the Cluster Java App.

## Control Section

**Load** the Data File specified by the Input Section.

**Cluster** the Data using values from the Parameters Section.

**View** the resulting file as specified in the Output Section.

**Graph** the resulting file as specified in the Graph Section.

**Exit** the Program when you are done.

## Input Section

The working directory is where the program will try to read the input file and write the output file.

The **Change** buttons allow you to change the working directory and the data file.

The data file "butter.dat" is a text file that has each data item of N dimensions on a single row separated by white-space (spaces or tabs). So three dimensional data would have a row like:

2.0 3.4 128.0

When you load the data, the computer will try to figure out the number of data items, as well as the dimensionality of the data.

## Output Section

Indicate the number of clusters desired and that maximum iterations.

The default is a random seeding into the Clusters. It is possible to have a crisp initial partition file. Just uncheck "Random Initial Assignment" and you will be asked to select a partition file.

The partition file "book.par" is a text file that contains the placement of each data point for each cluster. If there are five data points then the first row of "book.par" would be the first clusters data points:

1 0 0 0 1

indicates Cluster One contains the first and last data point.

## Parameters Section

### Initial

All clustering algorithms need two basic parameters.

- an initial number of clusters and
- a maximum possible number of iterations.

### Fuzzy

Crisp clustering puts points in one cluster or another.

Fuzzy Clustering spreads points out over clusters.

#### Fuzzy Clustering Pramaters

**Exponent m**- weights outliers, bigger m values decreases the effect of outliers. If m is one the program will crash.**Stopping -**the stopping value ends fuzzy clustering when there change between iterations is less than 0.05.

### ISODATA

k-means uses a fixed number of clusters.

ISODATA Splits large clusters and Merges close clusters.

#### ISODATA Parameters

**Min Cluster Size**- a minimum number of samples each cluster must contain.**Max Distance**- if a cluster is to far from the others it may merge.**Max Sigma**- a cluster is too big if its Standard Deviation is large and may be split.**Cluster Number Target**- ISODATA will try to end up with this many clusters, approximately.**Max Pairs To Merge**- in each iteration, what is the maximum nuber of pairs of clusters to merge.

## Graph Section

Use the **X** and **Y** dropdowns to choose which dimensions of the data are mapped to X and Y.

**Save** will save the window as a PNG file.