In this tutorial, we are going to learn how to perform a range lookup using pandas’ library in Python.
This lesson is a bit more advanced and is gear toward users with at least some experience using pandas for data analysis. But if you don’t have experience working with pandas or just started with pandas, you can follow along as I will be going everything in detail.
Buy Me a Coffee? Your support is much appreciated!
PayPal Me: https://www.paypal.me/jiejenn/5
Venmo: @Jie-Jenn
I will be showing you two different approaches. First method is using the numpy dot function, and the 2nd method is to create a pandas interval index object.
Example 1:
import numpy as np
import pandas as pd
df_RawData = pd.DataFrame({'Dates': {0: '06-01', 1: '06-02', 2: '06-03', 3: '06-04', 4: '06-05'},
'Value': {0: 3, 1: 7, 2: 13, 3: 16, 4: 22}, })
df_LookupTable = pd.DataFrame({'Start': {0: 1, 1: 6, 2: 11, 3: 16},
'Stop': {0: 5, 1: 10, 2: 15, 3: 20},
'Color': {0: 'Red', 1: 'Orange', 2: 'Green', 3: 'Blue'}, })
result = np.dot(
(df_RawData['Value'].values[:, None] >= df_LookupTable['Start'].values) &
(df_RawData['Value'].values[:, None] <= df_LookupTable['Stop'].values),
df_LookupTable['Color']
)
print(result)
df_RawData['Color'] = result
Example 2:
import numpy as np
import pandas as pd
dfStudents = pd.DataFrame({
'Student': ['Joe', 'Mark', 'Mary' ,' Jane', 'Lucy', 'Tom', 'Jerry'],
'Overall Score': np.random.randint(49, 100, 7)
})
dfGradeLookup = pd.DataFrame({
'Grade': ['A', 'B', 'C', 'D', 'F'],
'Min': [90, 80, 70, 60, 0],
'Max': [100, 89, 79, 69, 59]
})
val = dfGradeLookup.loc[:, 'Min':'Max'].apply(tuple, 1).tolist()
print(val)
indx = pd.IntervalIndex.from_tuples(val, closed='both')
print(indx)
dfStudents['Grade'] = dfGradeLookup.loc[indx.get_indexer(dfStudents['Overall Score']), 'Grade'].values
print(dfStudents)