Coding style

Consistency in coding style is important throughout the project. We recommend following the official Python style guide (PEP 8). Key points include: - Indentation: Use consistent tabs or spaces to maintain readability. - Line Length: Keep lines reasonably short; avoid exceeding half the width of notebook cells if applicable. - Comments: Use # for comments and write them clearly to explain the code. - Whitespace: Include proper spacing around operators, function arguments, and within control structures. - Imports: Organize import statements in logical groups and in the recommended order (standard library, third-party packages, local modules). ## Pull Request Process
- It is recommended to create a Git branch for each pull request in order to maintain consistency. - Code should follow the official python style guide - We will be using Docstring for documentation. - We use pytest (assert). Contributions with test cases included will be easier to accept since we can check erros better. ## Code of Conduct
- Each contributor shall adhere to the examples set below: Use welcoming and inclusive language Be respectful of differing opinions and viewpoints Be cognizant of intention when proivding constructive criticism Recognize the best in others Show empathy towards community members Each contributor should avoid behaviours such as:

  • Sexualized language and unwanted sexual attention/advances Trolling, insults, derogatory comments, and political attacks Public or private harassment Ignoring privacy concerns of the community And other conduct which could reasonably be considered inappropriate in a professional setting

Development tools, GitHub infrastructure, and practices used

Throughout this project, our team was able to apply various tools used in software engineering and practices in a real-life data science and machine learning workflow. We were able to leverage Git and GitHub for version control, and had a collaborative process throughout the projects - through pull requests, code reviews, and issue tracking. With these tools, we were able to manage changes systematically and document any design changes, and keep a traceable project history.

We were able to create and structure our project as a Python package with the use of a pyproject.toml, which encouraged reproducibility, and made dependency management explicit. We were also able to embed testing the with the use of pytest which was integrated into a GitHub Actions CI pipeline allowing tests to run automatically upon each push and pull request. This would allow the quality of the code to stay intact and helped us catch errors fast and quickly.

For the purpose of documentation, our team used Quarto and quartodoc in order to generate a user-facing documentation from the codebase directly. We leveraged Netlify in order to automate build and deployment, which ensured documentations to stay synced with our code. Furthermore, we encouraged and implemented organizational practices such as standard code design, and a clear and well-documented README.

Scaling the project: tools and practices

If we were to scale up this project, we would implement stricter CI checks, and add branch protection rules that required passing certain tests before merging. Furthermore, we would use semantic versioning and automated release for the workflows, ensuring that our deployments are stable.

For infrastructure, Docker would be used in order to ensure consistent environment and documentation across all development and deployment which would help for scalable computation and storage if the project was involved in a larger dataset. We would also make sure that all logging, and dependency security would be monitored as well, so the project remains reliable as it grows.

Overall, these tools and practices would help with the reproducibility, collaboration, and reliability of the project, and would scale organically from small academic projects to production-level systems.