00:00

QUESTION 16

A machine learning engineer is trying to perform batch model inference. They want to get predictions using the linear regression model saved at the pathmodel_urifor the DataFramebatch_df.
batch_dfhas the following schema: customer_id STRING
The machine learning engineer runs the following code block to perform inference onbatch_dfusing the linear regression model atmodel_uri:
Databricks-Machine-Learning-Associate dumps exhibit
In which situation will the machine learning engineer??s code block perform the desired
inference?

Correct Answer: A
The code block provided by the machine learning engineer will perform the desired inference when the Feature Store feature set was logged with the model at model_uri. This ensures that all necessary feature transformations and metadata are available for the model to make predictions. The Feature Store in Databricks allows for seamless integration of features and models, ensuring that the required features are correctly used during inference.
References:
✑ Databricks documentation on Feature Store: Feature Store in Databricks

QUESTION 17

A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed theapply_modelfunction that will look up and load the correct model for each group, and they want to apply it to each group of DataFramedf.
They have written the following incomplete code block:
Databricks-Machine-Learning-Associate dumps exhibit
Which piece of code can be used to fill in the above blank to complete the task?

Correct Answer: A
To parallelize the inference of group-specific models using the Pandas Function API in PySpark, you can use theapplyInPandasfunction. This function allows you to apply a Python function on each group of a DataFrame and return a DataFrame, leveraging the power of pandas UDFs (user-defined functions) for better performance.
prediction_df = ( df.groupby("device_id") .applyInPandas(apply_model, schema=apply_return_schema) )
In this code:
✑ groupby("device_id"): Groups the DataFrame by the "device_id" column.
✑ applyInPandas(apply_model, schema=apply_return_schema): Applies theapply_modelfunction to each group and specifies the schema of the return DataFrame.
References:
✑ PySpark Pandas UDFs Documentation

QUESTION 18

A data scientist has been given an incomplete notebook from the data engineering team.
The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Correct Answer: A
To use the pandas API on Spark, the data scientist can run the following code block:
importpyspark.pandasasps df = ps.DataFrame(spark_df)
This code imports the pandas API on Spark and converts the Spark DataFramespark_df into a pandas-on-Spark DataFrame, allowing the data scientist to use familiar pandas functions for further feature engineering.
References:
✑ Databricks documentation on pandas API on Spark: pandas API on Spark