Getting started with conventions in Python
References:
- Software Engineering for Data Scientists in Python (Datacamp)
- https://sphinx-rtd-tutorial.readthedocs.io/
- https://realpython.com/
PEP 8 Cheat sheet: https://cheatography.com/jmds/cheat-sheets/python-pep8-style-guide/
If you are a data scientist and would like to write a proper and high-quality script, I highly recommend that you should follow a strictly standard guideline for yourself in order to reduce complexity, become more readable, and make maintenance easier. Here are some tips commonly used for Python scripts:
- First check PEP 8 by pycodestyle/flake8/black
The PEP 8 provides a standard style in Python. You can check whether your scripts follow the PEP 8 convention with multiple python files by pycodestyle:
import pycodestyle
style_checker = pycodestyle.StyleGuide() # a StyleGuide instance
style_checker.check_files(['file1.py', 'file2.py']) # PEP 8 check
Also, we have the Flake8 displaying all warnings in your files. Flake8 is a wrapper tool which combines a debugger, pyflakes, with pycodestyle. Make sure you have installed the Flake8 on your correct Python version. You can open an interactive shell and run:
$ flake8 file.py
What happens if your code violates the PEP 8? Are you going to fix all by hand? Don’t worry, Black is an autoformatter which allows you to identify PEP 8 errors and fix them automatically. Notice that Black sets the limit on the maximum characters to 88 by default, but we can change to 79 as per the PEP 8 guideline:
$ black --line-length=79 file.py
2. Naming
We are now familiar with common terms in Python such as variable, function, class, module, and package. You should follow the naming style as below:
3. Blank lines
- top-level functions and classes with two blank lines
class FirstClass:
pass
class SecondClass:
pass
def function():
return None
- method definitions inside classes or between two functions with a single blank line
class MyClass:
def first_method(self):
return None
def second_method(self):
return None
4. Refactoring code
You should refactor longer functions into smaller units, which improves both readability and modularity. For example, you can break down this code
def polygon_area(n_sides, side_len):
"""Find the area of a regular polygon
:param n_sides: number of sides
:param side_len: length of polygon sides
:return: area of polygon
>>> round(polygon_area(4, 5))
25
"""
perimeter = n_sides * side_len
apothem_denominator = 2 * math.tan(math.pi / n_sides)
apothem = side_len / apothem_denominator
return perimeter * apothem / 2
into 3 smaller functions:
def polygon_perimeter(n_sides, side_len):
return n_sides * side_lendef polygon_apothem(n_sides, side_len):
denominator = 2 * math.tan(math.pi / n_sides)
return side_len / denominatordef polygon_area(n_sides, side_len):
perimeter = n_sides * side_len
denominator = 2 * math.tan(math.pi / n_sides)
apothem = side_len / denominator
return perimeter * apothem / 2
5. Comments (only displayed for yourself and collaborators):
- Use complete sentences with a capital letter
- 72 maximum characters
- Start each line with a # followed by a single space
- Separate inline comments by two or more spaces from the statement
6. Docstrings (for users)
"""[Summary]:param [ParamName]: [ParamDescription]
:type [ParamName]: [ParamType](, optional)
...
:raises [ErrorType]: [ErrorDescription]
...
:return: [ReturnDescription]
:rtype: [ReturnType]
:Example:.. note:: can be useful to emphasize
important feature
.. seealso:: :class:`MainClass2`
"""
7. Length
- 79 maximum characters
- Can use backslashes to break lines
- Break lines before the operator
from mypackage import module1, \
module2, module3def function(arg_one, arg_two,
arg_three, arg_four):
return arg_one
8. Whitespace around operators
- Assignment operators (=, +=, -=, and so forth)
- Comparisons (==, !=, >, <. >=, <=) and (is, is not, in, not in)
- Booleans (and, not, or)
- More than one operator in a statement: Adding a single space before and after each operator can look confusing. Instead, it is better to only add whitespace around the operators with the lowest priority. You can also apply this to if statements where there are multiple conditions
# Recommended
y = x**2 + 5
z = (x+y) * (x-y)# Not Recommended
y = x ** 2 + 5
z = (x + y) * (x - y)# Recommended
if x>5 and x%2==0:
print(x)
# Not recommended
if x > 5 and x % 2 == 0:
print(x)
9. Line up the closing brace/bracket with the first character of the line that starts the construct
list_of_numbers = [
1, 2, 3,
4, 5, 6
]