data-science-interview/README.md

50 lines
1.6 KiB
Markdown
Raw Normal View History

# Data Science Pair Interview
## Requirements
- Python 3.11 (you can change this in the `pyproject.toml` file)
2023-10-19 08:29:04 +00:00
- [Poetry](https://python-poetry.org/)
- [Docker](https://www.docker.com/)
- [Ngrok](https://ngrok.com/)
- [GNU Make](https://www.gnu.org/software/make/)
2023-10-19 08:29:04 +00:00
I recommend using [asdf](https://asdf-vm.com/) to manage your Python versions and Poetry installation.
## Usage
### Get Setup
Create your local environment with
```shell
make install
```
2023-08-22 12:26:50 +00:00
Ensure you have created and validated your account with [Ngrok](https://ngrok.com/).
### Make you Dataset
2023-10-19 08:24:35 +00:00
`src/interview/data.py` contains an example function to build the classic [Iris dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html). You can extend this as you see fit to create one or more custom datasets relevant to your business. There is a Poetry script to run the `make_data` function and build the dataset, which you can run with:
```shell
2023-10-19 08:21:42 +00:00
make data
```
### Run the Notebook with Docker
Build the container and run the notebook with
```shell
make build run
```
2023-08-22 12:26:50 +00:00
This will copy your data and notebook to the container, install any packages you specified in your `pyproject.toml` file, then run the notebook in the container. Jupyter will be available on port 8888. Take not of the authentication token.
### Setup a Tunnel
Start Ngrok with `ngrok http 8888`. This will give you the URL you will share with the candidate, as illustrated below.
![Ngrok](ngrok.png)
By following this link the candidate can login to Jupyter Lab running in the container on your machine using the Jupyter authentication token you took note of above.