In the development of the web spiders using scrapy framework, scrapy shell is very helpful because it can make developer debug the code quickly, however, I found that I have to type many imports to import the module into the shell, which is very time consuming.
So I dive into the scrapy source code to find a way to make me feel better when using scrapy shell. Here is my solution.
Less import
re, json, urlparse, copy module is very useful lib in writing spiders, if they are available when we star to write code in scrapy shell that will be very good. So let us get started.
edit the scrapy/utils/console.py
in scrapy source code
import re, json, urlparse, copy
The module is added to the header of the console file to make the module available in the python script
namespace["re"] = re
namespace["json"] = json
namespace["urlparse"] = urlparse
namespace["copy"] = copy
Since the namespace is used as local namespace in the ipython shell, so I add the module to the dict and make you can directly use re , json and urlparse after running scrapy shell
Compare url
Sometime I need to quickly compare two url and find out the difference between them. So I add a compare
method in my scrapy shell
import difflib
def compare(str1, str2):
a = str1.splitlines()
b = str2.splitlines()
d = difflib.Differ()
diff = d.compare(a, b)
print '\n'.join(diff)
It use the difflib in python lib and make me get things done quickly.
Here is the final console.py, you can directly copy it and use it as you wish.
import re, json, urlparse, copy
import difflib
def compare(str1, str2):
a = str1.splitlines()
b = str2.splitlines()
d = difflib.Differ()
diff = d.compare(a, b)
print '\n'.join(diff)
def start_python_console(namespace=None, noipython=False, banner=''):
"""Start Python console binded to the given namespace. If IPython is
available, an IPython console will be started instead, unless `noipython`
is True. Also, tab completion will be used on Unix systems.
"""
if namespace is None:
namespace = {}
namespace["re"] = re
namespace["json"] = json
namespace["urlparse"] = urlparse
namespace["copy"] = copy
namespace["compare"] = compare
try:
try: # use IPython if available
if noipython:
raise ImportError()
try:
try:
from IPython.terminal import embed
except ImportError:
from IPython.frontend.terminal import embed
sh = embed.InteractiveShellEmbed(banner1=banner)
except ImportError:
from IPython.Shell import IPShellEmbed
sh = IPShellEmbed(banner=banner)
sh(global_ns={}, local_ns=namespace)
except ImportError:
import code
try: # readline module is only available on unix systems
import readline
except ImportError:
pass
else:
import rlcompleter
readline.parse_and_bind("tab:complete")
code.interact(banner=banner, local=namespace)
except SystemExit: # raised when using exit() in python code.interact
pass
Update on 2015-05-27
I think it is better to use scrapy shell with ipytho profile. So I modified scrapy/utils/console.py
to get this done.
def start_python_console(namespace=None, noipython=False, banner=''):
"""Start Python console binded to the given namespace. If IPython is
available, an IPython console will be started instead, unless `noipython`
is True. Also, tab completion will be used on Unix systems.
"""
if namespace is None:
namespace = {}
try:
try: # use IPython if available
if noipython:
raise ImportError()
try:
from IPython import start_ipython
start_ipython([], user_ns=namespace)
except Exception as e:
import ipdb; ipdb.set_trace()
except ImportError:
import code
try: # readline module is only available on unix systems
import readline
except ImportError:
pass
else:
import rlcompleter
readline.parse_and_bind("tab:complete")
code.interact(banner=banner, local=namespace)
except SystemExit: # raised when using exit() in python code.interact
pass
I created a file called $home/.ipython/profile_default/startup/01-scrapy.py
which contains this
import re, json, urlparse, copy
import difflib
def compare(str1, str2):
a = str1.splitlines()
b = str2.splitlines()
d = difflib.Differ()
diff = d.compare(a, b)
print '\n'.join(diff)