Data Science Research (BFftF)
Big Data is rapidly taking over society. Good structures for using data responsibly are now essential. Here it is important to really understand the business and, in a wider sense, the context of the issue. For the study of Big Data, the ideal is to start from one or more research groups in a multidisciplinary manner (both within HZ and with external partners). The diversity of issues relevant to society to be dealt with in this manner is great. And it delivers a range of authentic professional situations for the instruction. The first filter will of course be formed by HZ’s priorities such as Land & Water, Tourism & Business and Industry & Logistics. These in turn link to, for instance, the top national sectors and the national academic agenda. Big Data issues really demand data-driven decision-making. That is the essence of the Data Science specialist area. Within the context of the above-mentioned priorities: Data Science in a Sustainable Digital Delta.
In this minor you will learn how to set up, carry out and motivate practically-orientated research. You do this partly by following instructions (lectures) but above all, by carrying out your own research project. This research will likely be multidisciplinary, so you might work together with students of other disciplines. You will be supported by an experienced researcher and a process-supporting lecturer. Instructions will be given in a way that optimally support the research that you carry out. The Cross Industry Standard Process for Data Mining (CRISP-DM) will structure the phases in your research and also the instructions that you receive (Shearer, 2000) […] CRISP-DM is a cyclical, iterative process that you need to go through more than one time. Also note that many learning goals are optional: you have the freedom to choose the ones that match your research.
Data science research starts with business understanding - it means that you talk to stakeholders within the organization to understand what goals they exactly want to achieve. You learn to recognize possible applications of Data Science and help the organization to formulate a specific research question. You work together in a team of students, some with a technical and others with a more content related focus. Taking this into account, you define and allocate roles and tasks for the research as a whole.
When the research question is clear you start to look for ways to answer it. While exploring available datasets, you translate the business question into a data questions that can be answered. You carry out exploratory data analysis to get to know the datasets and visualize data to better understand the contents. In practice you may need to go back to business understanding to find the right match between business needs and available data.
During data preparation you take the datasets that you identified as input, and decide what needs to be done before a model can be applied. You learn the difference between structured versus unstructured data and you may need to apply techniques like natural language processing to further preparation of the data. You need to recognize the measurement level of variables and enrich the datasets by creatively extracting new features. To achieve all of this, you will probably apply languages such as SQL to manipulate the data, and use techniques to retrieve and store data involving either files, api's or databases. Before modeling can start, you choose a level of analysis and aggregate data accordingly. Often you will need a way to handle missing data or apply other data cleansing techniques. Some of these preparation steps will only show up when you already started the modeling. In that case you need to go back to this phase and do some extra preparation.
You learn to think about the difference between modeling approaches such as supervised and unsupervised models and modeling tasks like classification and regression. If you have specific hypotheses
and the dataset is relatively small, you will apply statistical testing to evaluate the significance of research findings. In larger datasets you will usually calculate and interpret performance metrics and apply cross validation to evaluate models. You will learn about different types of models: you compare linear models and decision trees, and explore more complex models like tree ensembles, neural networks or support vector machines. You will learn how to choose between models, and understand how the bias and variance tradeoff effects the performance of models.
After finishing the modeling phase, you start to review the Data Science results by verifying whether the business goals have been achieved. You also review the Data Science process and critically look at the different steps. You will probably identify ways to improve your existing model. This is valuable input for further research. You document your findings and scripts to achieve reproducible results, and finally report Data Science research findings in a well-structured document.
Evaluation can either initiate a new CRISP-DM cycle, or deployment of a model that has proven to achieve business goals. In the latter case, you can proceed to build an end-to-end application that implements the model. During this phase you will sort out implementation details, together with your stakeholder(s). If the data stream is very large, or needs to be processed in real time, you can consider big data technology to provide a solution that is faster and more scalable.
Are you interested in working on and learning in these projects? You can do so by participating in the minor Becoming Fit for the Future (30 ECTS, September - January or February - June). Download the brochure (PDF) or find the same information at the educational platform Learn (access with an HZ account only).
Participants with a background in or strong affinity with the following subjects can join this minor:
- Civil Engineering
- Logistics Engineering
- Marine Officer
- Water Management
Min. grade: 5,5