Databricks Databricks-Certified-Professional-Data-Engineer Practice Test : Free Online Preparation

Question 1
- Which of the following commands allows data engineers to perform an insert-only merge?
  A : MERGE INTO orders
  USING new_orders
  ON orders.orders_id = new_orders.orders_id
  WHEN MATCHED
  INSERT *
  
  B : MERGE INTO orders
  USING new_orders
  ON orders.orders_id = new_orders.orders_id
  WHEN NOT MATCHED
  INSERT *
  
  C : MERGE INTO orders
  USING new_orders
  ON orders.orders_id = new_orders.orders_id
  WHEN MATCHED
  INSERT *
  WHEN NOT MATCHED
  IGNORE *
  
  D : MERGE INTO orders
  USING new_orders
  ON orders.orders_id = new_orders.orders_id
  WHEN NOT MATCHED
  INSERT *
  WHEN MATCHED
  IGNORE *
  
  E : MERGE INTO orders
  USING new_orders
  ON orders.orders_id = new_orders.orders_id
  WHEN NOT MATCHED
  INSERT *
  ELSE IGNORE *
  
  Answer: B
Question 2
- A junior data engineer has manually configured a series of jobs using the Databricks Jobs UI. Upon reviewing their work, the engineer realizes that they are listed as the "Owner" for each job. They attempt to transfer "Owner" privileges to the "DevOps" group, but cannot successfully accomplish this task. Which statement explains what is preventing this privilege transfer?
  
  A : Databricks jobs must have exactly one owner; "Owner" privileges cannot be assigned to a group
  
  B : The creator of a Databricks job will always have "Owner" privileges; this configuration cannot be changed.
  
  C : Other than the default "admins" group, only individual users can be granted privileges on jobs.
  
  D : A user can only transfer job ownership to a group if they are also a member of that group
  
  E : Only workspace administrators can grant "Owner" privileges to a group.
  
  Answer: A
Question 3
- The data engineering team maintains the following code:
  Assuming that this code produces logically correct results and the data in the source table has been deduplicated and validated, which statement describes what will occur when this code is executed?
  
  A :
  The silver_customer_sales table will be overwritten by aggregated values calculated from all records in the gold_customer_lifetime_sales_summary table as a batch job.
  
  B :
  A batch job will update the gold_customer_lifetime_sales_summary table, replacing only those rows that have different values than the current version of the table, using customer_id as the primary key.
  
  C :
  The gold_customer_lifetime_sales_summary table will be overwritten by aggregated values calculated from all records in the silver_customer_sales table as a batch job.
  
  D :
  An incremental job will leverage running information in the state store to update aggregate values in the gold_customer_lifetime_sales_summary table.
  
  E :
  An incremental job will detect if new rows have been written to the silver_customer_sales table; if new rows are detected, all aggregates will be recalculated and used to overwrite the gold_customer_lifetime_sales_summary table.
  
  Answer: C
Question 4
- A junior developer complains that the code in their notebook isn't producing the correct results in the development environment. A shared screenshot reveals that while they're using a notebook versioned with Databricks Repos, they're using a personal branch that contains old logic. The desired branch named dev2.3.9 is not available from the branch selection dropdown. Which approach will allow this developer to review the current logic for this notebook?
  A : Use Repos to make a pull request use the Databricks REST API to update the current branch to dev2.3.9
  
  B : Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch
  
  C : Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch
  
  D : Merge all changes back to the main branch in the remote Git repository and clone the repo again
  
  E : Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository
  
  Answer: B
Question 5
- The data engineering team has a Silver table called ‘sales_cleaned’ where new sales data is appended in near real-time.
  They want to create a new Gold-layer entity against the ‘sales_cleaned’ table to calculate the year-to-date (YTD) of the sales amount. The new entity will have the following schema:
  country_code STRING, category STRING, ytd_total_sales FLOAT, updated TIMESTAMP
  It’s enough for these metrics to be recalculated once daily. But since they will be queried very frequently by several business teams, the data engineering team wants to cut down the potential costs and latency associated with materializing the results.
  Which of the following solutions meets these requirements?
  A : Define the new entity as a view to avoid persisting the results each time the metrics are recalculated
  
  B : Define the new entity as a global temporary view since it can be shared between notebooks or jobs that share computing resources.
  
  C : Configuring a nightly batch job to recalculate the metrics and store them as a table overwritten with each update
  
  D : Create multiple tables, one per business team so the metrics can be queried quickly and efficiently.
  
  E : All the above solutions meet the required requirements since Databricks uses the Delta Caching feature
  
  Answer: C

Free Databricks Databricks-Certified-Professional-Data-Engineer Exam Questions

Try our Free Demo Practice Tests for Comprehensive Databricks-Certified-Professional-Data-Engineer Exam Preparation