My best practices for Python

Christof Buchbender

Hosted also here: https://hera.ph1.uni-koeln.de/~buchbend/python_best_practices.html

Use Python3

Although python2.7 is still widely used and the default on many distributions, python3 will become the standard in the near future. Important packages such as astropy do not release new versions for python2 any more. So if you just start to use python or do not have legacy code that needs python2 you should switch to python3 now. Good references are:

Use pip (or conda) instead of distribution packages (via e.g. apt-get)

To install and upgrade python packages you do not have to rely on the installer of your OS. I would recommend to use pip or conda to install, upgrade and maintain your python packages.

Use Virtual Environments for Python

No matter which python version you use, using virtual environments facilitates the work with python. A virtual environment is a container into which all packages and dependencies are installed. One can setup multiple virtual environments in parallel and switch between them. Personally I am using the helper package virtualenvwrapper to maintain my virtual environments setup with virtualenv both for python2 and python3. However, in python3 a native implementation of virtual environments check https://docs.python.org/3/library/venv.html for the details. If you only use python3 this is the recommended way to employ virtual environments.

Package Everything

Even single scripts that are longer than just a few lines benefit from being packaged as a proper python package. Packages have the following benefits:

  • Control of the dependencies that are needed to run the python program
  • Usage of the package from other python programs via import
  • Distribution of the program to collaborators
  • Creating command line programs for general usage of the python program (via entry_points)

How to package a python script

from setuptools import setup

setup(name='kosma_update_fits_header',
      version="1.0",
      description=('Tool to update the fits header'),
      include_package_data=True,
      packages=['kosma_update_fits_header'],
      url='',
      author='Christof Buchbender',
      author_email='buchbend@ph1.uni-koeln.de',
      license='MIT',
      install_requires=['pandas',
                        'astropy',
                        'argcomplete',
                        'pyyaml'],
      entry_points={"console_scripts": [
          ('kosma_update_fits_header = '
           'kosma_update_fits_header.kosma_update_fits_header:main'),
          ('remove_all_emu_scans = '
           'kosma_update_fits_header.remove_all_emu_scans:main'),
          ('fits_file_info = '
           'kosma_update_fits_header.find_and_filter_fits_files:main'),
          ('fits_check_for_simulator = '
           'kosma_update_fits_header.find_and_filter_fits_files:'
           'check_for_simulation_scans'),
          ('fits_correct_pointing_model = '
           'kosma_update_fits_header.kosma_update_fits_header:'
           'correct_pointing_model_main'),
          ('get_scan_range = '
           'kosma_update_fits_header.find_and_filter_fits_files:'
           'get_scan_range')
           ]
                      },
      zip_safe=False)

Write PEP8 compliant code

The coding style guidelines for python that are defined in the PEP8 document

are a de-facto standard in the python community. Following these coding style guide makes your program easily readable by other Python developers and lead to a consistent programming style. One can use linter programs that check wether your codes complies with the guidelines.

One of the most useful rules is IMHO to not use lines that are longer than 79 or 80 characters. This accomplishes two things. One can open two or three scripts side by side on a normal Full-HD screen. The second thing is that one is forced to write code that is easier to read. In python one can make pretty incomprehensible lines of code by nesting functions into each other.

Make functions and classes as simple as possible

Try to make functions and classes as concise as possible. One can create Objects or functions in python that do it all easily. But in the long run this makes debugging the function or Class very difficult. One should use rather many small objects or classes with very consise tasks and approriate names.

Use the logging package

Python comes with a very flexible package for logging:

The logging system has different warning levels and can be configured to write to screen, to files or other output streams. This way it is easy to write out consistent debugging messages from your program. For example one can list always the file and line number or the function name in the debug message:

import logging
logging.warning("This is a simple Python logger example")
WARNING:root:This is a simple Python logger example
In [31]:
#importing module
import logging

###
# Next two lines only needed for jupyter notebooks
from importlib import reload
reload(logging)
###

#Create and configure logger
logging.basicConfig(format='%(asctime)s %(name)s %(levelname)s %(message)s')
  
#Creating an object
logger=logging.getLogger()
  
#Setting the threshold of logger to DEBUG
logger.setLevel(logging.DEBUG)
  
#Test messages
logger.debug("Debug Message")
logger.info("Information")
logger.warning("Warning")
logger.error("Error")
logger.critical("Critical")
2021-06-16 13:45:34,575 root DEBUG Debug Message
2021-06-16 13:45:34,576 root INFO Information
2021-06-16 13:45:34,577 root WARNING Warning
2021-06-16 13:45:34,578 root ERROR Error
2021-06-16 13:45:34,579 root CRITICAL Critical
In [37]:
###
# Next two lines only needed for jupyter notebooks
from importlib import reload
reload(logging)
###

logging.basicConfig(filename="newfile.log", format='%(asctime)s %(name)s %(levelname)s %(message)s') 
logger.debug("Debug Message")
logger.info("Information")
logger.warning("Warning")
logger.error("Error")
logger.critical("Critical")
2021-06-16 13:46:41,005 root DEBUG Debug Message
2021-06-16 13:46:41,006 root INFO Information
2021-06-16 13:46:41,007 root WARNING Warning
2021-06-16 13:46:41,008 root ERROR Error
2021-06-16 13:46:41,009 root CRITICAL Critical

If possible write tests for your functions (If your script is bigger that just a few small functions)

https://docs.python.org/3/library/unittest.html

Make use of the Object orientation of python

Python is an object oriented language, but one can get around without ever designing own classes. Classes are not always nessecary but sometimes they are superior to functions and more convenient to use.

class Example(object)
In [29]:
class Vector(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def __add__(self, other):
        return Vector(
            x=self.x+other.x, 
            y=self.y+other.y
        )
    
    def __mul__(self, other):
        return self.x * other.x + self.y + other.y
            
    def __str__(self):
        return "<{},{}>".format(self.x, self.y)

    def __repr__(self):
        return "<{},{}>".format(self.x, self.y)
        
a = Vector(1, 2)
b = Vector(3, 4)


print("a+b =", a+b)
print("a*b =", a*b)
a+b = <4,6>
a*b = 9
In [28]:
type(9)
Out[28]:
int

Version Control all your scripts

Use e.g. git to version control your programs:

  • Sharing and developing code with collaborateurs
  • Easy tracking of changes and reverting destructive changes

https://git.ph1.uni-koeln.de/

Jupyter notebook and Jupyter lab for interactive data analysis

Jupyter allows to create notebooks that contain "live code". Code is seperated into individual cells and the output of the code is pasted back into the notebook.

Twelve-Factor App

This is a guideline on how to write good programs in general not just python. It is a good reference for best practices for software development.

https://12factor.net/

My favorite python package (to name but a few)

  • basic tools:

    • pandas: Python Data Analysis Library
    • astropy: A common core package for Astronomy in Python
    • matplotlib: Python 2D(and 3D) plotting library
    • numpy: large, multi-dimensional arrays and matrices support for python and large collection of high-level mathematical functions
    • scipy: Python-based ecosystem of open-source software for mathematics, science, and engineering.
    • click: Command line Interface Creation Kit
    • dynaconf: Configuration Management for Python.
  • databases:

    - sqlalchemy: Python SQL toolkit and Object Relational Mapper t

  • distributed/paralell computing

  • machine learning:

    • astroml: Machine Learning for Astronomy
    • scikit-learn: Machine Learning in Python
    • tensorflow: open-source software library for dataflow programming
  • web development (e.g. for data presentation):

    • flask: microframework for Python
    • django: high-level Python Web framework
    • jinja2: full featured template engine for Python
  • miscellaneous:

  • webscraping (gathering data online):

    • beatifulsoup: Python library for pulling data out of HTML and XML files
  • NOT PYTHON:

    • vue: Javscript framwork useful for flexible frontends

Good python books (IMHO)

There are many introduction books and tutorials the books I chose here go deeper than typical introductions.