# Data Science Pair Interview

## Requirements

- Python 3.11 (you can change this in the `pyproject.toml` file)
- [Poetry](https://python-poetry.org/)
- [Docker](https://www.docker.com/)
- [Ngrok](https://ngrok.com/)
- [GNU Make](https://www.gnu.org/software/make/)

I recommend using [asdf](https://asdf-vm.com/) to manage your Python versions and Poetry installation.

## Usage

### Get Setup

Create your local environment with

```shell
make install
```

Ensure you have created and validated your account with [Ngrok](https://ngrok.com/).

### Make you Dataset

`src/interview/data.py` contains an example function to build the classic [Iris dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html). You can extend this as you see fit to create one or more custom datasets relevant to your business. There is a Poetry script to run the `make_data` function and build the dataset, which you can run with:

```shell
make data
```

### Run the Notebook with Docker

Build the container and run the notebook with

```shell
make build run
```

This will copy your data and notebook to the container, install any packages you specified in your `pyproject.toml` file, then run the notebook in the container. Jupyter will be available on port 8888. Take not of the authentication token.

### Setup a Tunnel

Start Ngrok with `ngrok http 8888`. This will give you the URL you will share with the candidate, as illustrated below.

![Ngrok](ngrok.png)

By following this link the candidate can login to Jupyter Lab running in the container on your machine using the Jupyter authentication token you took note of above.