In this post I'm rambling about failing to find new niches & skills, and more new types of business areas from the data I'm messing with.
First I made this small table in my database like this:
In order to make the wordcloud I just take two consecutive weeks and compute a score where I try to emphasize the growth of that word so it shows up larger if it had a big increase over the last week.
Someone recommended we normalize the words by using the nltk.stem.snowball.EnglishStemmer and nltk.wordnet.morphy and this led to having fewer words (about 125,000 unique words), still a lot of them made no sense since they weren't actual words (they should've been checked against a dictionary/lexicon).
Anyway, so this is what the word-cloud looked like ( used d3-cloud to draw it ):
What a mess ..
I was thinking of grabbing a larger lexicon of the English language that would at least help to check words against to make sure they're valid and not just random strings(a lot of misspellings in the data).
I think the technology names (libraries/frameworks) would be in D2 without D1.
Eventually I found such a lexicon here except it's from 2013 and things have evolved since then. I didn't use the lexicon.
So the problem with the word cloud is you try to cram a lot of information in it, how many words can you really fit in a 1000x1000 rectangle? (the author of d3-cloud warns about this by linking to this article) The layout algorithm uses randomized positions and tries to fit in your words, so every time you build your word cloud you get a different one even if your data is the same.
I'll look some more into this but without the word clouds.