Intro:
Upwork, formerly Elance-oDesk, is a global freelancing platform where businesses and independent professionals connect and collaborate remotely. It is one of the largest freelancer marketplaces. I have started to work on this platform since 2014 as a software developer. When I am using upwork to find interesting job to post, it seems the results are not very satisfied and the web page displaying the detail of the job is not very straightforward. So I decided to write a program to make me feel better to deal with job info on Upwork. Considering that this program need to get the job data from upwork, return web page which have better UI, do some statistic job to help freelancer find job, I choose to use Python to do this job.
Here it is.
As you can see, there are mainly three components contained in this program. The first part is a spider focus on fetching data from upwork and insert the data into the database. Fortunately, Upwork has provided a official python package to help me get this job done without processing html. The second part is a web app which display the job data in better UI, there are many options to do this job, django seems a good choice. The third part is a application which analyze the job data and return some data based on statistical analysis. For example, I can know from the data what tech is becoming popular, what type of job I can find if I learn some tech (Someone will not learn PHP if he does not want to develop web app).
Now I will introduce the program part by part, let's start from first part, the spider. You can find all the code below at here
Spider
Upwork has provided officlal API in many different languages such as python, php, ruby, java and so on. All you need is to follow the API Reference. First, register an application for an key pair at https://www.upwork.com/services/api/apply, and then, install upwork-python in an empty virtualenv(If you have no idea what is virtualenv, please check http://docs.python-guide.org/en/latest/dev/virtualenvs/). How to use upwork-python? Dont worry, in the upwork-python project(https://github.com/upwork/python-upwork), you can find directory named example, there is a good tutorial about how to use upwork-python.
Now we start to use upwork-python to serach the job which have python keyword based on example in the package.
import upwork
from pprint import pprint
PUBLIC_KEY = ""
SECRET_KEY = ""
def desktop_app():
"""Emulation of desktop app.
Your keys should be created with project type "Desktop".
Returns: ``upwork.Client`` instance ready to work.
"""
print "Emulating desktop app"
public_key = PUBLIC_KEY
secret_key = SECRET_KEY
client = upwork.Client(public_key, secret_key)
verifier = raw_input(
'Please enter the verification code you get '
'following this link:\n{0}\n\n> '.format(
client.auth.get_authorize_url()))
print 'Retrieving keys.... '
access_token, access_token_secret = client.auth.get_access_token(verifier)
print 'OK'
# For further use you can store ``access_toket`` and
# ``access_token_secret`` somewhere
client = upwork.Client(public_key, secret_key,
oauth_access_token=access_token,
oauth_access_token_secret=access_token_secret)
return client
if __name__ == '__main__':
client = desktop_app()
try:
print "Get jobs"
pprint(client.provider_v2.search_jobs({'q': 'python'}))
except Exception, e:
print "Exception at %s %s" % (client.last_method, client.last_url)
raise e
If you run the python script above, you will get a list of python jobs in json format. However, the shortcoming of the script is, everytime you run the python script, you need to copy the auth url, paste it in the web brower and then copy the verifier back to make the spider pass the auth, which is not easy to use. Can we make the program automatically auth without any interuption of human? Of course! Now I will intro a better way to do this. We can use some tech to control web browsers to get the url and return the verifier back. Considering that Selenium allows for automated control of real browsers on real operating systems and now it can control a headless browser named phantomjs, we can integrate these function into the spider.
Because phantomjs is written by nodejs so we need to install it by using npm, so first, we install node, npm by using nvm(maybe the best node version manager, you can find it on github), second, pip install selenium, now we can make the spider automatically pass the auth, here is the code snippet
def get_verifier(url, browser):
browser.get(url)
if "log in and get to work" in browser.page_source.lower():
LOGGER.info('try to login in {browser.current_url}'.format(browser=browser))
#try to login
browser.find_element_by_xpath(
"//input[@id='login_username']").send_keys(USERNAME)
browser.find_element_by_xpath(
"//input[@id='login_password']").send_keys(PASSWORD)
browser.find_element_by_xpath("//div[@class='checkbox']//label").click()
browser.find_element_by_xpath(
"//button[@type='submit']").click()
LOGGER.info('use password to login in')
else:
LOGGER.info('use cookies to login in')
output = auth_get_token(browser)
return output
def auth_get_token(browser):
"""
authorize access and get the token then return back
"""
msg = browser.find_element_by_xpath(
"//div[@class='oNote']"
).text
if not msg:
browser.find_element_by_xpath(
"//button[@type='submit']").click()
msg = browser.find_element_by_xpath(
"//div[@class='oNote']"
).text
output = msg[msg.rindex("=")+1:]
return output
Now we can make the spider automatically pass the auth, which is very convenient, but the spider only return a few jobs and then quit out , can we make the spider search jobs continuously and make the spider more extendable? The answer is yes.
class Client(object):
lock = RLock()
def __init__(self):
self.client = self.get_client()
def get_client(self):
"""Emulation of desktop app.
Your keys should be created with project type "Desktop".
Returns: ``odesk.Client`` instance ready to work.
"""
client = upwork.Client(PUBLIC_KEY, SECRET_KEY)
url = client.auth.get_authorize_url()
if USERNAME and PASSWORD:
try:
browser = create_browser()
verifier = get_verifier(url, browser)
except Exception, e:
raise e
finally:
browser.quit()
else:
verifier = raw_input(
'Please enter the verification code you get '
'following this link:\n{0}\n\n> '.format(url))
LOGGER.debug('Retrieving keys.... ')
access_token, access_token_secret = client.auth.get_access_token(verifier)
LOGGER.debug('OK')
client = upwork.Client(PUBLIC_KEY, SECRET_KEY,
oauth_access_token=access_token,
oauth_access_token_secret=access_token_secret)
return client
def search_jobs(self, *args, **kargs):
LOGGER.debug("search_jobs enter lock")
with self.lock:
try:
LOGGER.debug(threading.currentThread().name + "search_jobs get lock")
sleep(2)
result = self.client.provider_v2.search_jobs(*args, **kargs)
LOGGER.debug(threading.currentThread().name + "search_jobs get result")
return result
except Exception as e:
LOGGER.exception(e)
Note that RLock is used to make it work in multithread program, so we can use it in this way
#Job_Finder is a subclass of Thread
job_finder = Job_Finder(client, KEY_LS)
job_finder.start()
We can add some methods to extract the job info and insert the data into the db for later use, in the next post, I will describe the web app which make user check job in cleaner UI.
Thanks for your patience, you can check all the code above at here , I have pushed the code.