Last year I did not get the extension (2nd year) for the Postdoctoral fellowship (SEP-PRODEP) at the Physic Institue of BUAP. In other words, I was unemployed. Then I found a way to help in my family business by selling Jicama Seeds; this adventure let me lunch a website called jicameros.com, and do every 15 days FB adds to sell the seeds in my country (Mexico). With my new digital skills, I experience in first person that the most valuable resource we can have today is data. Data help me to make very important decisions to take advantage of my limited money resources. Fortunately, the last year was not lost because, while I was selling goods online, I was drafting a manuscript called Silver Antimony Sulfide Selenide Thin‐Film Solar Cells via Chemical Deposition, which was published in March 2021.
Moreover, I use my free time to develop coding skills. Looking on the web, I realize that the first step in understanding data was to learn a programming language. This new language, of course is Python.
Why learn Python?
There are lots of programming languages out there. Therefore, the way I chose it was to focus on my background in material science. In my day-to-day, I am developing thin films for solar cells. Then I produce data from the experimental chemical deposition to the material characterization. In the material characterization, I get structural, optical, and electrical data where I should analyze and determine the thin-film properties like crystal structure, band gap (Eg), and electrical conductivity (σ). It doesn’t sound very easy but with a solid background in semiconductors physics and Excel knowledge, all the analyses can be made. However, the process is tedious, and the quality of the graphs for publishing is not good. Therefore, I had been using Origin Lab because our research group at IER-UNAM had some licenses. With this software, we got decent graphs for paper publishing. The problem came when I left my last research group in 2019, and I began to do private research. Without an Origin license, I realize I should find an alternative to do all the analysis. Therefore, the problem is where the opportunity emerges. I found that calculating matrixes (data) can be done with Python + Numpy and data visualization with Python + Matplotlib. The following picture shows the command line and the version of each library I have installed on my machine.
For doing material science analysis you will require Python, Numpy and Matplolib installed in your machine.
The problem now was to learn all that stuff:
- Python language.
- Import data.
- Clean data.
- Process data (calculation and determination of optoelectrical properties)
- data visualization.
Learning Python for material science
There are no courses with that name. What I found to learn the basics of Python was by looking on Youtube. Here, I found an excellent 5-hour course by Nana called Python from Zero to Hero. In this course, Nana gives an interactive course using the learn-by-doing strategy to teach Python Syntaxes, Cycles, Conditional statements, etc. I highly recommend taking the entire course in just a week because when you finish it, you will have the basic skills for implementing it on your easiest task for data analysis.
Python Tutorial for Beginners: Learn Python in 5 hours
Once I finished the Python Course, I continued doing some exercises which I found on the web. At this stage, I avoid the beginner’s trap of repeating the basic courses. Then, I can advise you to put in your mind, “Do not repeat the Beginners Python Course.” I mean, I integrate python programming to my workflow. In that way, I found that new libraries (for example Scikit-Learn) can be found by doing a google search by looking for a specific solution to the problem. For example, if you don’t know how to import ASCCI data to your python interface, you should google these lines:
- How to import file.txt into Python
- How to import file.csv into NumPy
With the basics of any programming language, you will learn by doing (coding), and if you forget some syntaxes, you always have google to look for the solution. Then, do not be afraid and try every code you find but with a critical mind. I mean, you should know what you want to implement and what results you will obtain. Of course, the first calculation requires doing it on paper.
Jupyter Notebooks is the easy way to start on Python for data analysis.
Once I learned to do some Python Scripts. I found the easy way to learn Python using the interactive Jupyter Notebooks. This software launch an interactive interface in the internet browser (Safari for me), which will give you the opportunity to administrate folders and create the jupyter notebooks. The following picture shows the interface where I am storing all my data divided into folders by material characterization.
- EDS: Energy Dispersive Spectroscopy
- Hotplate: Temperature annealing profiles
- Profiler_IFUAP: Thickness measurement
- etc.
Inside the folder, I create the file.ipynb notebook which contains the analysis I am looking for. Here, I take care of creating one notebook per analysis. This strategy let me organize my data and find the required graphs days or months later. In the following picture, you can see five Jupyter notebooks that contain the analysis of band gap for some thin film materials I have been doing this semester. Take a look at the directories and how to save the filename. For example, the 20220614_direct_Allowed_Eg_AgSbSSe2.ipynb tells me that on June 14, 2022, I determine the direct-allowed band gap of AgSbSSe2 thin films. The filename and date are related to the laboratory logbook, where I have more details about the calculations and the film characterization.
When you open the file, you will see all the code I created to analyze band gaps. In the next picture, I will only show you the last plots that I am getting with the determination of the direct-allowed transition of AgSbSSe2 thin films. But all the results will keep private until the research work is published.
Don’t forget to certify your new skills
Yes, you should certify your new skills to show to the people who will hire you in the future you have experience in doing data analysis using python programming. Here, I found a great course called Python Crash Course by the IER-UNAM (Renewable Research Institute – UNAM). This course teaches all the basics of Python in just 20 hours by programming stuffs related with engineering and renewable energy. It was in this course that I found another library called Pandas, which is widely used in data analysis. But that my friends is another story.
The Python Crash Course was developed by Dr. Guillermo Barrios and his colleagues, you can find the repository here: https://github.com/AltamarMx/crash_course_Python . If you are interest in taking the certification at UNAM, follow this link: Crash Course de Pyhton y Jupyter Notebook