A quick pass at the Scala version of an approximation solution to the 0-1 knapsack problem is very close to the Python version. The post of this is over here.

Using a window function via SQL may have been a better way to go to find the partial sums of weights, after calculating all the ratios of item profits over the select function used:

sum(weights) OVER (ORDER BY ratio desc) as partSumWeights

and the Windows function would likely be more clear.

Apache Spark should be able to be used for all types of problems that can be expressed directly using data-parallelism, and with specific coding, more complex algorithms.

It will be interesting to see how Spark Dataframes perform in terms of run-time analysis for more complex parallel algorithms, well more complex than the greedy 0-1 knapsack.