
I am facing a challenge while iterating through rows of a pandas dataframe in Python. Specifically, I am reading data from an excel sheet using pandas and then iterating through the data using df.iterrows(), processing it further to automate a work flow using python/selenium. Each row of my spreadsheet belongs to job applicants and their attributes are getting captured in different columns. However, since a person can have multiple degrees their educational details are captured in columns as degree1, specialation1, college1, degree2, specialiation2, college2, etc. Maximum 5 qualifications can be filled in. During iteration, I want to breakout and then loop through rows of education (degree, specialization, college) for one person. How to accomplish this task? I am attaching a github link of the sample data for reference text which basically has the same data as is pasted below:
python
| 
					 1 2 3 4 5  | 
						# Sample data sr_no   old_emp_id  name    address mobile  degree1 specialisation1 college1    degree2 specialisation2 college2    degree3 specialisation3 college3    emp_status 1   24  Amit    ABC Road    356363474   Computer Science    Robotics    IIT Delhi   MSC ML  MIT PHD AI  Harvard full-time 2   34  Samit   Xyz Road    367474748   Bachelor of Arts    Economics   Delih Univ  Masters of Eco  Internatioal Relation   Delhi Univ  PHD Foreign Trade   Delhi Univ  part-time 3   56  Richard PTC Street  363637677   Bsc Biology Mumbai Univ Masters of Science  Microbiology    Mumbai Univ PHD Communicable disease    Mumbai Univ part-time  | 
					
I tried grouping it using a custom function presented below. But it does not give the desired results.
| 
					 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28  | 
						# Custom function to group attributes differently def group_attributes_diff(df):     new_data =[]     for i in range(0, len(df),35):         candidate_info={}         for j in range(i,i+7+1):             row = df.iloc[j]             degree_name = row['degree_name' + str(int(j-i)//7+1)]             specialisation = row["specialisation" + str(int(j - i)//7 + 1)] if "specialisation" + str(int(j - i)//7 + 1) in row else None             course_start_date = row["course_start_date" + str(int(j - i)//7 + 1)] if "course_start_date" + str(int(j - i)//7 + 1) in row else None             course_end_date = row["course_end_date" + str(int(j - i)//7 + 1)] if "course_end_date" + str(int(j - i)//7 + 1) in row else None             marks_grades = row["marks_grades" + str(int(j - i)//7 + 1)] if "marks_grades" + str(int(j - i)//7 + 1) in row else None             university = row["university" + str(int(j - i)//7 + 1)] if "university" + str(int(j - i)//7 + 1) in row else None             course_type = row["course_type" + str(int(j - i)//7 + 1)] if "course_type" + str(int(j - i)//7 + 1) in row else None             education={                 "degree_name":degree_name,                 "specialisation":specialisation,                 "course_start_date":course_start_date,                 "course_end_date":course_end_date,                 "marks_grades":marks_grades,                 "university":university,                 "course_type":course_type             }             candidate_info["Education "+str(int(j-i)//7+1)] = education         new_data.append(candidate_info)     return pd.DataFrame(new_data) test_df = group_attributes_diff(df.copy()) print(test_df.to_excel('education.xlsx'))  | 
					
Thanks in advance for your inputs!