Clean up of the Master Price List with Python

Frankie Jay
Oct 30, 2021
3 min read

Our acquisition by a larger company has involved expansion of our product range by over 30k SKUs (Stock-Keeping Unit). The reference for pricing of all of these items has been a singular excel spreadsheet that has passed through the hands of two different managers like a relay baton, becoming increasingly bloated,

This is used by all members of the organisation (customer service, sales, supply chain) to find prices and is stored on Teams. Because of its size its so slow to use on teams, and usually requires a download to work properly.

The most complex sheet is the table of reusable surgical instrument products with their pricing. This table also contains the pricing information used to calculate margins and list pricing for each item.

To implement a price increase the last Sales Manager hid the original columns and copied these across to change the margins, making the file even larger. In addition there are a large amount of items marked No Longer Available (***NLA***) that could be cleaned from the file.

Unfortunately attempts to modify the file result in a memory error, and is also quite tedious in excel.

While that Sales Manager countered the large file size by demanding a new computer, we will use Python to make the following changes:

Create an archive file containing all the item codes and the original pricing information - transfer, landed, list, margin and margin %, removing the April price increase information.
Remove the items marked ***NLA*** into a separate file to archive (as sometimes their descriptions contain the alternate codes to use).
Create a current file containing all current item codes and current pricing information - transfer, landed, list, margin and margin %.

I used Google Colab out of convenience, and we can see the approx 30k entries and different columns on the uploaded df.

1. Split off the last 5 columns.

This is to create an archived list of the original pricing information, so that we can keep this as a copy, and then remove off the 'current file'.

The current file has the calculations for the price increase in the last 5 rows as shown below.

We can remove those columns to obtain the original price list before the price increase as shown below.

Export it into its own excel file like so

And download it to reformat and make prettier.

2. Separate out No Longer Available items.

To archive into a separate file for later reference, in case we are asked to find equivalents to old items as sometimes the description is tagged with the new product code.

From our file we were able to identify 4761 product codes that were marked NLA (and some other occasional terms)