web*hunting

Nick Gallimore

Natural Language Talent in the UK

without comments

As you or may not know, my primary recruiting speciality these days is the Natural Language space. As a linguist by background, I’ve always been utterly fascinated by human language, annd the continued evolution of commercial applications of Natural Language theory means it’s now possible for me to make a living out of the industry. My interests at this stage are primarily Computational (this is where the excitement is happening, believe me), and boradly speaking covers Natural Language Processing, Computational Linguistics, Machine Learning & Sentiment Analysis.

Talent-wise, this is a really interesting space in the UK right now. Here’s a little brochure I made outlining where it’s at and where it’s going:

For more information, drop me a line to nick @ gallimore.biz, or come and join the growing NLP Careers group on LinkedIn here.

Written by nick

December 7th, 2011 at 11:32 am

Google Custom Search Engines

without comments

I’ve been interested in different ways of using the web to find people for as long as I can remember. Back in 2005 (before LinkedIn’s real emergence – yes, that’s before LinkedIn!) my colleagues and I would talk extensively about what resourcing would look like the in the future.  Most were convinced that the “de facto” latest methods of resourcing at that point (job-boards, on-line advertising) – which, at the time, were just seriously cool and a real differentiator for those who knew how to use them effectively – would continue to be the primary candidate sourcing tools we’d use for the forseeable future.

However, some of us could see a future where the web became continually more and more open.  A world where information about people and and what those people are doing would continually move from behind paywalls into the “open” web.  We could see that resourcing in the future would become less about investment in specific technology and more about investment in people-sourcing time and skills.

We would spend a lot of time playing with general internet searching techniques (we used to call it “Google hacking”, primarily to make it sound cool although actually it was just Google searching), always with the aim of finding the personal websites or on-line portfolios of individuals who might fit our jobs.  We were fairly successful with it too – placing a decent number of (primarily PhD-level) candidates found this way.

Eventually, I wrote this custom search engine on Google to help us. Looking back at it now, it feels fairly old hat – but for the time it was written it was brilliant, even if I do say so myself. Have a play with it and feel free to share. It should be fairly self-explanatory – just punt in some skill keywords and/or locations, hit search and then use the refinements to switch between guaranteed CVs and personal sites.

For what it’s worth by the way, I expect the shift from closed information to open information to continue and probably gather pace over the next few years. The web is becoming more open and more accessible every day. The next five years is going to be very interesting from a people sourcing perspective – as the “battle” for visiblity and brand amongst the on-line population intensifies, and as social network use plateaus and eventually falls away as sites continue to try to drive increases in profitability.

Written by nick

August 24th, 2011 at 8:29 am

Posted in webhunting

Tagged with , , ,

Automating your LinkedIn Xray Results with Web Scrapers

without comments

So for a while now I’ve wondering if there’s a solution to a very real problem with searching LinkedIn for suitable candidates.  That problem is time: it takes ages to take the results of a search, collate them somewhere, assimilate all the information and put it into a sensible format.  And I think I have found the answer: web scrapers.

Let me do you an example.  Recently, we’ve seen growing demand for something called Cucumber.  Yes, Cucumber.  It’s a Behaviour Driven Development tool which is increasingly used as a development aid and to automate tests.  It’s a cool name, I know.

So anyway, let’s imagine we want to generate some names of people who have used or are using cucumber.  Typically, LinkedIn is a decent place to start, so let’s run a simple X-ray:

site:uk.linkedin.com cucumber (inurl:pub OR inurl:in) -intitle:cucumber -inurl:groups -inurl:jobs -inurl:dir

(If you can’t be bothered to copy and paste, here’s a link to the results)

 

Now before you speculate too much, I included the -intitle:cucumber because it transpires that there’s a surprising number of people with the surname Cucumber, and this is a neat way of filtering those out (whilst at the same time not excluding anyone whose name is Cucumber but who also happens to be a Cucumber expert).

Our search has generated 119 results (obviously we could filter this down further by location if we wanted to).  That’s 119 potential candidates.  This is good.

So what next?  Well, this is where it gets interesting.  So normally, the recruiter’s process would be fairly mundane – visit each result, make a list of name, current company and any other interesting information (links to personal sites, perhaps).  With that list made, the fun can begin.

It’s the list-making that will bug most people.  It’s boring.  It takes aaaages. Distractions happen.  People accidentally shut the browser window and have to start again.  It’s a potentially search-destroying process.

That was often how it was for me until I discovered web scrapers.  My personal favourite is Outwit Hub, because it’s a Firefox extension (so easy to install) and because it’s free (a premium version is available and is probably worth investigating, to be fair).  What Outwit allows us to do is scrape the information from our search results and dump it straight into an Excel or CSV file in just a few clicks.

Now, I’m not going to bore you with an in-depth tutorial (there’s plenty of others out there already), but Outwit allows you to create your own scraper – you define the different pieces of information you want to extract from a page, and then you define where in the HTML Outwit needs to look to find that information, by telling it what code immediately precedes and follows the information you want.  In the case of a list of information (like search results, for instance), Outwit can use the code snippets you give it to recognise each item in that list.

I’ve built scrapers for Google searches, Bing searches and what I call “In-App” LinkedIn searches (i.e. a search run using LinkedIn’s own interface), but theoretically you could use this every single time you run a web search for candidates.  In my case, my LinkedIn X-Ray scraper takes that search result screen in the link above and turns it into:


(Because I’m extra nice, here’s a screenshot of the scraper itself, for you lazy folk to copy):

This is such a time-saver that it’s become pretty much my de-facto method of grabbing my search results.  And when you consider some of the other information you can “rip” from a search (like LinkedIn account ids, for instance, or web links on public LinkedIn profiles, or e-mail addresses on “In-App” LinkedIn profiles), you realise that the boundaries of what LinkedIn offers the resourcer is somewhat further back than you used to think…..

Written by nick

June 22nd, 2011 at 9:21 pm

Posted in webhunting

Tagged with , , , , ,

Sourcing with Github

without comments

I’ve been playing with Github as a people sourcing tool for a little while now. On the face of it, it should be a pretty decent source of tecchie people. After all, it’s an open platform whose users share code, so its user-base should be full of hardcore coders.

The chance to really test it out came along when a client asked for help with a search for Test Development Engineers. These are the people who create test harnesses and automation scripts, as well as physically running tests. They’re a nightmare to find because most people who are qualified to do it (in this case they’ll need python, perl or unix scripting skills) don’t want to work in a predominantly test environment and because the actual number of people capable of doing it is pretty limited in the first place.

They are however, definitely the kind of people who will hang out on Github. They’re proper tecchies. Given that I know that the client I’m working for has pretty much exhausted more mainstream techniques for finding people, this is a great testing ground for running a search on Github.

So after a fair bit of playing around, it seems that there are some great things about Github (for webhunting purposes) and some less great things. On the plus side:

- Individual user profiles contain the user’s location
- Individual user profiles contain some information about their core language skills
- User profiles seem to get indexed by Google and can be X-Ray searched
- Most users seem to have personal websites linked from their Github pages, giving us access to contact details in a good %age of cases, allowing us to build target lists
- A good %age of users also use other social platforms (Twitter in particular) so can be found and traced that way
- Once you’ve signed up you can freely message people (although I am guessing that this wouldn’t enamour you to the majority of users and instinctively wouldn’t be my approach)

On the less useful side though:

- User profiles contain no visible employment/bio section (that would be too much to ask, I guess) so we’re working from very, very limited information.
- Finding usable information is time-consuming – pay-back is likely to be relatively low. This isn’t a particularly efficient route to a talent pool.

There is an advanced search function which offers the option of searching by location and by the language that each user has used in submission to the site’s public repository. This is OK, but it doesn’t seem to support Boolean (I’ve not exactly tested it through the roof, thought). So I X-rayed using this:

site:github.com location cambridge (python OR perl OR shell) -”Cambridge, MA” -”Cambridge, Ontario” -inurl:blog

(The “-inurl:blog” restricts the results to user’s profiles only, and not to blog posts on Github) I’ve got 140 results, and looking through them I reckon that at least 75% of those people will be contactable. With limited professional information about them, that contact will have to be focussed on asking for expertise/referrals, although I’m expecting/hoping for a decent response rate. I’ll let you know how I get on once I have some responses.

Written by nick

May 26th, 2011 at 4:01 pm

Posted in webhunting

Tagged with , ,

Google AROUND…playing with some seriously simple search strings

without comments

The best searches are nearly always the simple ones.

I’ve been playing around a lot with Google’s AROUND operator. It’s not documented particularly well, but this operator allows you to search for documents where one search term is featured in the vicinity of another. For instance:

“coffee shop” AROUND(2) bicester

returns pages where the term “coffee shop” is close to the word Bicester. The higher the value in the brackets after AROUND, the wider the proximity between each term can be.

There are relatively complex ways to use this, particularly when you’re in the mindset of “CV Hunting”. For instance, you can use it group skills and job titles together to reduce the number of false positives (think: Java AROUND(2) “Software Developer”).

But the beauty of it is when you break it down to a really simple level – and focus it on finding people, not CVs (or profiles). For instance, I used this really effectively on a search for a Web Designer in Cambridge:

(“I work as” OR “I am a” OR “I am currently” OR “I currently work” OR “working for”) AROUND(3) (“web designer” OR “web developer” OR (design AND web)) AND XHTML AND CSS AND (Cambridge OR Cambridgeshire OR Cambs)

Beautifully simple, yet really effective.

Written by nick

May 24th, 2011 at 10:00 am

Posted in webhunting

Tagged with ,

Bing “contains” Operator: a great tool for blog search

without comments

Bing, the search engine it took me ages to get around to, definitely has some interesting features that set it aside from Google. One of my favourites is the “contains” operator, which allows you to search for pages that contain links to particular types of file.

For instance: venice contains:(wmv) returns pages which feature the word Venice and at least one link to a wmv file.

For the purposes of web*hunting, this has many potential uses. For instance, you could use it to search for pages with links to .docs or .pdfs (a kind of extension of a filetype: search for CVs, assuming that a good %age of people who put their CV on-line do so using a doc or a pdf and a nice search if you want some guaranteed CVs as opposed to reams of shitty job ads).

Probably my favourite use of it though is to find bloggers. I like finding bloggers because they’re easy to engage with, often lead you onto other relevant people (tip: always check the most recent comments) and are nearly always interested in hearing from you. Being able to find them (and no, Google Blog search doesn’t count) is (was), though, a bit of a pain.

Using contains: on Bing, and assuming that most blogs have some sort of feed on them – either an RSS, Atom or XML file, we can use Venice contains:(rss atom xml), and we should theoretically get loads of bloggers in our results.

What other uses for contains can you come up with?

Written by nick

May 23rd, 2011 at 2:26 pm

Posted in webhunting

Tagged with ,

What is web*hunting?

without comments

…it’s like head-hunting, but it’s web. Webhunting is a fast-growing methodology of finding specific people, people with particular backgrounds, skills or competencies using the information available on the web.

(it has to be said, webhunting is far from an official term. I just made up, just literally a minute ago.)

As the web becomes increasingly open, so do the possibilities for the people-hunting trade (by “open” I mean both open as in the open provision of information and open as in accessible to those who want to share information about themselves). More and more people want to and have the means to share more and more information about themselves on-line. The amount of data about who people are, where people are and what people are doing is growing all the time.

As an industry, webhunting seeks to find ways of making sense of all that data, and use it to connect people with need. That is what it is, and that is what I do.

Written by nick

May 20th, 2011 at 12:09 pm

Posted in webhunting