Home » Video » ...

How to name clusters in cluster analysis Videos

Technical Course: Cluster Analysis: Tutorial with an Example

This is a clip from the Clustering module of our course on data analytics by Gaurav Vohra, founder of Jigsaw Academy. Jigsaw Academy is an award winning ...

User Comments

Tayeenul Hoque commented on 13 Feb 2015

Can you please explain or suggest for likert sclae ordinal data which cluster analysis should apply ? Is it K-Means Cluster/Hierarchical/ Two step

Jigsaw Academy commented on 18 Feb 2015

Hi TayeenulIn case of several variables, PCA is somehow essential to isolate the variables having high association among them and so it is manageable to work with few attributes with out any loss of information.Based on your example of 4 (exogenous / endogenous) categorical variable with 20 items each (record), even if it is highly correlated you could do clustering without the application of PCA.

Tayeenul Hoque commented on 13 Feb 2015

Is it necessary to conduct CATPCA (categorical principal component analysis) prior to starting the cluster analysis, and can you please tell me after CATPCA how can I proceed for cluster analysis apparently the method. As I have four exogenous variable which contains 20 items.

Jigsaw Academy commented on 13 Feb 2015

One option is to treat Likert data as equal interval data, in which case you can use K-means using any distance measure (Euclidean, for example).CheersJigsaw Faculty Team

Cluster Analysis In SPSS (Hierarchical, Non-hierarchical & Two-step)

Sorry about the issues with audio - somehow my mic was being funny*** In this video, I briefly speak about different clustering techniques and show how to run ...

User Comments

StatArena commented on 14 Aug 2013

The purpose of both the techniques is pretty similar i.e. to classify objects into categories. The diference however is that in DA categories are always known, where as in CA, they may or may not be known. Knowledge of the categories in DA is used by the DA algorithm to derive the classification rule. Copy-paste from a book chapter:"Although both cluster analysis and discriminant analysis classify objects (or cases) into categories, discriminant analysis requires you to know group membership CTD

StatArena commented on 14 Aug 2013

...for the cases used to derive the classification rule. The goal of cluster analysis is to identify the actual groups. For example, if you are interested in distinguishing between several disease groups using discriminant analysis, cases with known diagnoses must be available. Based on these cases, you derive a rule for classifying undiagnosed patients. In cluster analysis, you don't know who or what belongs in which group. You often don't even know the number of groups.". Hope it helps.

Elias Tsolis (Estatistics) commented on 09 Nov 2015

Nice vid StatArena! Added to "Statistics Videos" & Featured CHs: youtube.com/channel/UCJcBCQAwoNOWueJYpHc8RWA .Im Psychologist/Statistician (Msc).I do Statistical Analysis using PSPP (sim. to SPSS), R (lavaan,semtools,semplots, sim. to AMOS).Im Writer of statistical articles: "estatistics.eu/articles". You may check my stuff too and LiCS. U can reach me By email: [email protected] or by Face: EstatisticsEU LiCS: Like, Comment, Share/Subscribe

Vikas Shrivastava commented on 25 Jul 2013

nice,,,, just this query,,,, can respondents in a cluster be electronically separated and analysed further as separate group ? I come form health industry,,,, here we have flow cytometry data GATED on scatterplots in specialized softwares,,,, can SPSS do the same to select a cluster for subsequent analysis.?

StatArena commented on 28 Apr 2013

You cluster, say, respondents on the basis of something (e.g. height, income, gender etc). These become the variables and bases for clustering. For answer to second part of your question, please watch at 2.19. Thanks for watching and commenting.

StatArena commented on 25 Jul 2013

Cluster analysis can create a new nominal variable for the clusters (as shown in the video you watched). To analyze each group further, you could either use the "select cases" or the split file option in SPSS under "Data" menu.

StatArena commented on 19 Dec 2012

Unless "Classify" is known by some other label in version 21, your SPSS program/software seems to be incomplete (i.e. it does not include all the modules). Can not comment for sure as I'm yet to use v.21. Thank you.

cwa commented on 01 Apr 2015

In SPSS, how do you generate the BIC and log likelihood values for two-step cluster analysis? I see these reported in papers as indices of fit

smart2mill commented on 28 Apr 2013

great video and you have my like but I was wondering how do you choose the variables for clustering? Do they always have to be scalar?

sidkings commented on 02 Jul 2013

Wa'alaikum asalaam akhi. I'm doing an SPSS course on Market segmentation. Very helpful video indeed.

30Spiritual commented on 03 Oct 2012

Very well done sir! I was looking for a good video on this topic for some time now. Thanks !

Niels Louwes commented on 19 Dec 2012

When I click on Analyze, I do not get all those options such as Classify. I have SPSS 21.

Cluster Analysis in Data Mining tutoreal 22 Divisive Clustering Algorithms

Part of the Data Mining Specialization » Learn how to take scattered data and organize it into groups for use in many applications, such as market analysis and ...

Unsupervised Machine Learning - Hierarchical Clustering with Mean Shift Scikit-learn and Python

This machine learning tutorial covers unsupervised learning with Hierarchical clustering. This is clustering where we allow the machine to determine how many ...

User Comments

Sagnik Saha commented on 28 Oct 2015

Unsupervised Machine Learning part 3 ?? when u r going to upload.I am really interested in machine learning plz do upload as soon as possible

Sagnik Saha commented on 28 Oct 2015

+sentdex actually not getting any perfect tutorial like u. That's y i asked can u give me some links plz if possible

sentdex commented on 28 Oct 2015

+Sagnik Saha Probably still a few months away from revisiting ML. There are tons of resources for learning ML though, doesn't have to be me. Hit up Google search.

Fikre Beyene commented on 17 Jul 2015

Great Video, super informative. What would be a better unsupervised learning method for sample sizes larger than 10,000?

sentdex commented on 18 Jul 2015

+Fikre Beyene google "choosing the best estimator" and find the image from scikit learn. It's like a flowchart of what to choose under various circumstances.

Sheli Ostrow commented on 16 Apr 2015

Thanx! this was so useful! are you planning to do a PCA tutorial?

sentdex commented on 16 Apr 2015

+Sheli Ostrow Definitely, PCA is a must have topic. Not sure when, but it's on the list.

Momshad Alvee commented on 12 Feb 2015

Hello, I know this is unrelated to this video but its kind of important and would appreciate your help. About a year ago you made a comment thanking for the 5k good and 5k bad tweets from some dataset on this video: watch?v=ytUHvMNnzZk I cannot access this dataset, do you still have it or could you kindly point me to where I can get similar data. Thank You, keep up the awesome videos :)

Momshad Alvee commented on 16 Feb 2015

plzzz reply, I have a module depending on this

Momshad Alvee commented on 12 Feb 2015

+sentdex would you kindly send me the dataset at [email protected], if you can find it that is. The link for the dataset is broken and after hours of searching on the internet for a something similar, I couldn't find anything.

sentdex commented on 12 Feb 2015

I have it somewhere, definitely backed up in a few places. I wouldn't get rid of a dataset like that one. It was really good, lots of tricky lines. Was actually planning to find it again to use in some of these videos with NLTK. I do not know where you can find another labeled dataset like it, but schools tend to host all kinds of epic data sets. You can email me if you want, I'll dig it up possibly. It's definitely buried deep, but I am sure I still have it.

Danilo Canivel commented on 05 Feb 2015

Thanks for the videos man, it's awesome. I've watch it all in 2 days ;). Please don't stop doing please! Don't worry about the major of the subscribers are not viewing it right now, It's a brand new territory, for most of them, doing it today, you create a name on that in the future, because no one is doing that today... If you let it now probably someone will do it. This area will be the most important development area for the next years, and you are creating your name on that for sure doing this serie. Please don't stop, please!!!! Or create some paid serie, I pay for it for sure! anyway tks mate very good didactic

Danilo Canivel commented on 06 Feb 2015

+sentdex Tks mate, have done already, the Udacity Nanodegree(200/month) and some others courses at Edx and Coursera too, none have the same approach as you, very practical and simple. well hope you keep doing, and I will donate for sure!

sentdex commented on 06 Feb 2015

I don't want to have a paywall for content. Thought about indigogo/kickstarters for specific niche series in the past, but meh, they take too much time to set up and promote, rather just spend that time teaching the content. I may have something more specific for putting money towards specific development series via pythonprogramming.net, currently re writing that, almost done with the porting.You're free to show your support for specific content via donations at the moment, that's about it. I have all the stats for this series, and I can compare interest at release and through time to any of the others.Next topic will be Kivy. I wont be leaving scikit-learn behind for good. I am not too worried about someone else doing a series on the same topic, I'd welcome it. It'll probably take anyone who hasn't already started releasing content longer to get to this point than it will for me to do kivy and come back.We need more people covering these topics anyways. I could easily have an entire year dedicated to scikit-learn and machine learning.Stay tuned.Coursera has a nice free machine learning course. Covers a lot of really great stuff. If you need your ML fix, head there :P

Chris Smith commented on 05 Feb 2015

Not to babble on about this stuff, but I found a commercial product at: //info.prelert.com/anomaly-detective-for-security-use-cases which has an appropriate use case for network security. I guess it's promising, if they can code it, then maybe I can too using python/scikit. The company I work for would never buy such a product, they're cheap, and only use open source or my code.

Chris Smith commented on 05 Feb 2015

+sentdex By "open source" I meant software like ruby, python, and rails ... they have attorneys and avoid any software that involves payment. Although they do pay for support on some software, like elasticsearch ... a common approach. I'm not sure I share your confidence about getting this to work in python/scikit, but I'm going to give a try.

sentdex commented on 05 Feb 2015

many companies require open source software, so they can run code in-house. I wonder what you mean by "open source," and if they know that the traditional "open source" license actually may still be a paid license, especially when used commercially. What you want to do is certainly do-able, and within the scope of something easily done with very little customization from scikit-learn

Chris Smith commented on 05 Feb 2015

I just noticed that I may be able to use machine learning to spot anomalies in network traffic ... my day job is network security using snort, suricata, python/rails apps. There are a lot of scholarly papers on anomaly detection, but no actual example code that I've found. I don't know if it will work as there is a lot of network traffic even on small networks. If anyone knows of any existing code for anomaly detection please let me know.

sentdex commented on 05 Feb 2015

+Chris Smith You should go at it yourself! Just take it one step at a time. Size of data shouldn't change things too much, besides possibly locking out some algos, or processing time. How big are we talking.

Chris Smith commented on 05 Feb 2015

+sentdex Thanks, that looks on target. I know how to gather network data, but still need to figure out how to train or how much to train, and how/when to spot anomalies ... the scale of the data is just a bit intimidating. Hopefully I can find some "beginner" code, or just start with what I've learned from these videos.

sentdex commented on 05 Feb 2015

//scikit-learn.org/stable/modules/outlier_detection.html

PASW Hierarchical Clustering Analysis

Geog 181: Spatial Analysis. Homework 3 Question 4 instructions. Shows how to run a hierarchical clustering analysis in PASW.

User Comments

OLDILAENDER commented on 10 Sep 2012

No I meant the tool to make the "youtube video" not the software you used for clustering. sorry for being imprecise.

OLDILAENDER commented on 09 Jun 2012

hi, could you tell me which tool/software you use to make the video?

u511003 commented on 09 Sep 2012

PASW earlier called SPSS made by IBM corporation

DoHazards commented on 10 Sep 2012

Sorry for the slow response. I used Camtasia.

DarkLordOfThDoomZone commented on 29 Oct 2011

00:45 is that really the right name?

LabVIEW Tutorial #6: Clusters

Location of code and other future goodies: //paulfmcgowan.com/ Lesson #6 on LabVIEW shows the usefulness of clusters. Working on a more complex ...

User Comments

Paul McGowan, Jr. commented on 03 Mar 2013

You give the cluster building icon a template that tells it what "type" of cluster you'll be working with. It's just like initializing a very customized variable of a specific type.

me commented on 26 Feb 2013

Do you mean you specify the number of elements of the cluster or just that they are (each) constant?

shenaruto commented on 17 Mar 2012

Thank you for all your tutorials so far. They are great basics for LabVIEW learners.

Paul McGowan, Jr. commented on 26 Feb 2013

The constant tells the cluster building icon what the cluster will be composed of.

Paul McGowan, Jr. commented on 25 May 2012

I will have to look into this, but I'm sure I can get something together.

me commented on 26 Feb 2013

Hello What's that constant (at 3:20) for? regards

Paul McGowan, Jr. commented on 25 Apr 2012

I mis-numbered them. This should be #7...

Paul McGowan, Jr. commented on 17 Mar 2012

@shenaruto Thanks for watching

comebackata2 commented on 24 Apr 2012

where is tutorial 7?

me commented on 04 Mar 2013

OK, thanks

Pronesniper commented on 17 Mar 2012

Thanks!

TITLE:DOCUMENT CLUSTERING FOR FORENSIC ANALYSIS:AN APPROACH FOR IMPROVING COMPUTER INSPECTION

TITLE:DOCUMENT CLUSTERING FOR FORENSIC ANALYSIS:AN APPROACH FOR IMPROVING COMPUTER INSPECTION NAME: J. ANITHA JOSEPHINE ...

Large-scale cluster management at Google with Borg

Authors: Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes Abstract: Google's Borg system is a cluster manager ...

User Comments

sk commented on 21 Oct 2015

Does anyone know what a link shard is? I read the paper and it doesn't really explain what exactly a link shard is.

Joseph Jacks commented on 20 Nov 2015

+sk "For performance scalability, each Borgmaster replica runs a stateless link shard to handle the communication with some of the Borglets; the partitioning is recalculated whenever a Borgmaster election occurs. For resiliency, the Borglet always reports its full state, but the link shards aggregate and compress this information by reporting only differences to the state machines, to reduce the update load at the elected master. If a Borglet does not respond to several poll messages its machine is marked as down and any tasks it was running are rescheduled on other machines. If communication is restored the Borgmaster tells the Borglet to kill those tasks that have been rescheduled, to avoid duplicates. A Borglet continues normal operation even if it loses contact with the Borgmaster, so currently-running tasks and services stay up even if all Borgmaster replicas fail."

Link for boards:

Link for your site:

Current link: