Installing Python
The easiest way to get not only Python but all the python libraries needed for data science work is to download the Anaconda distribution.
Detailed instructions are available at https://docs.anaconda.com/anaconda/install/linux/. I prefer to use Python version 3 but the instructions show you how to download Python 2 too.
There is also a choice between anaconda and miniconda. Miniconda as the name suggests, is a mini version of Anaconda that only contains the conda package manager and it’s dependencies. If you prefer to have conda plus over 720 open source packages, install Anaconda.
Installing PostgresSQL
You will definitely need a database for your data science projects. At some point working on really large datasets is more efficient within a database. I chose to go with PostgreSQL 12.
Installation instructions can be found here: https://www.postgresql.org/download/linux/ubuntu/ but I’ve also copied the instructions for Ubuntu 18.04 LTS.
- Create the file
/etc/apt/sources.list.d/pgdg.list
and add a line for the repositorydeb http://apt.postgresql.org/pub/repos/apt/ bionic-pgdg main
- Import the repository signing key, and update the package lists
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key addsudo apt update
- Finally install the application
sudo apt install postgresql-12
Changing the location of the database to your SSD
If you remember how I built my Ubuntu Analytics server, I added in both an SSD (Solid state drive) and HDD (Hard Disk Drive) for storage. The idea was that my database and any other application that needed fast data read/write capabilities would sit on the SSD. Since Postgresql would be handlong a lot of data it became an ideal candidate to move to the SSD. There is good news that the it has a parameter called data_directory that just needs the folder path to your SSD
Digitalocean has a great tutorial on just that.
Once that’s done your database should be very fast.
Install JupyterLab
I found a good set of instructions on this website https://agent-jay.github.io/2018/03/jupyterserver/
I did make one change to the instruction. To use jupyterlab at localhost:ipaddress you will have to create an ssh tunnel. That’s what the last set of instructions was doing
ssh -N -f -L 8888:localhost:9999 [email protected] #Change the specifics as required
Instead I started jupyterlab with the ip address of my server itself so i could reach it directly without using localhost. So if your headless Ubuntu server was at 192.168.1.2 on your LAN then you can use the command as follows:
jupyter lab --no-browser --ip=192.168.1.2 --port=12345
You can now access Jupyterlab at https://192.168.1.2:12345 without opening an ssh tunnel to the server.
Another thing to do if you’re getting a 403 error when you try to open a notebook could be because of folder restrictions. So make sure to change the default notebook directory location in the config file to a location you have write access to.
c.NotebookApp.notebook_dir = '/absolute/path/to/folder'
Updated: Mar 26, 2020