Apache Spark Python Source for 0-1 Knapsack Approximation Solution

Over on the other blog, posted a 0-1 Knapsack approximation solution in Python for Apache Spark. The source code for this was tested in a local installation on Ubuntu 16.04 for Apache Spark pre-built for Hadoop 2.7 and later.

The original source code contains setup code for the Spark / Python environment, as well as a Scala solution. I do not think it will run in Databricks without some modification, although it’s possible to copy the knapsack.py file contents into the same Spark Python notebook entry as the test code to try it out.

Original 0-1 approximation source code is over on Github.

 

Merging DataFrames Columns for Apache Spark in Python without a Key Column

At the Databricks Q&A site, looked at how to take DataFrames of identical length in rows andperform merge on columns into one DataFrame without an existing key.

If we include the function monotonically_increasing_id() to give an increasing IDs to rows, then merging with join works. Not shown here we could pad the smaller dataframe (fewer rows) to be the same length as the longest. This is needed to make monotonically_increasing_id() give the same IDs for both of the DataFrames.

# For two Dataframes that have the same number of rows, merge all columns, row by row.

# Get the function monotonically_increasing_id so we can assign ids to each row, when the
# Dataframes have the same number of rows.
from pyspark.sql.functions import monotonically_increasing_id

#Create some test data with 3 and 4 columns.
df1 = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aaa","foo"), ("aaa", "bbb","ccc","ddd")], ("k", "K" ,"v" ,"V"))
df2 = sqlContext.createDataFrame([("aaa", "bbb","ddd"), ("www", "eee","rrr"), ("jjj", "rrr","www")], ("m", "M" ,"n"))

# Add increasing Ids, and they should be the same.
df1 = df1.withColumn("id", monotonically_increasing_id())
df2 = df2.withColumn("id", monotonically_increasing_id())

# Perform a join on the ids.
df3 = df2.join(df1, "id", "outer").drop("id")
df3.show()

And at Github have created a GitHub repository to store Spark code snippets as I work on them.

Darrell Ulm profile WordPress.org

Learning more about WordPress and Drupal migrations and best ways to perform the database migrations for Views/Ubercart in Drupal and how to recreate in WooComerce. Darrell Ulm’s WordPress.org profile links are below:

Darrell Ulm WordPress Support Page
Darrell Ulm WordPress Main Profile Page

Finding well traveled WordPress plug-ins and learning about WP-API for custom plug-in development in PHP, similar to Drupal 5, 6, and 7.

Blogger, Tumblr