If you need to work with dates prior to 1900 in Excel, you may expect a few tough challenges. Microsoft Excel date functions can only use dates between January 1, 1900 and December 31, 9999. For dates in this range, Excel stores them as a date serial number, where

January 1, 1900 =1

January 2, 1900 =2

January 3, 1900 =3

Etc…

December 31, 1900 =365…

Oops! Not so fast! Excel mistakenly considers 1900 as a leap year, which was not the case, so Excel stores December 31, 1900 as date serial number 366. This is a known issue, and you can read here (why this error exists and Microsoft chooses not to fix it.)

I am writing this article on May 27, 2021, corresponding in Excel to the date serial number 44343.

The picture below shows how Excel reacts if we format column A cells as Text and leave column B cells with General format, and then type the same date strings in both columns:

Dealing with Dates prior to 1900 in Excel with Power Query | Dealing with Dates prior to 1900 in Excel with Power Query | Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia AlvesSolve and Excel - Celia AlvesSolve and Excel - Celia Alves

Logic tells us that any date before January 1, 1900 should be assigned a negative serial number, but unfortunately, Excel does not do that. Any dates before January 1, 1900 or after December 31, 9999 will be stored and displayed as Text by Excel, which does not allow to make any calculations with them or sort them. Therefore, calculating ages or duration intervals between two dates where at least one of those dates is outside the range January 1, 1900 – December 31, 9999 is not a straightforward task in Excel. It can be accomplished only with VBA or creative formulas.

 

The picture below shows how Excel reacts if we format column D cells as Text and column E cells as custom Short Date format, and then type the same numbers in both columns:

Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia Alves

I recently went through the exercise of importing from a web page data about the United States presidencies, using Power Query to determine the total duration in days of each presidency. The data looks like this:

Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia Alves

Each presidency’s start and end dates are provided in the same cell, but that is nothing that Power Query cannot handle quite easily. Power Query imports the dates as Text data type, and then we can separatthe two dates for each presidency in separate columns, getting the following: 

Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia Alves

When we convert the data type of the two columns to date, Power Query has no issues recognizing dates before 1900 or calculating the difference between the presidency starting and end dates.

Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia Alves

So, if we need to perform calculations with dates before January 1, 1900 in Excel, Power Query is probably the easiest method to use. Besides, the M language offers a fantastic array of functions to work with dates, time values and durations. It is an amazing world of possibilities!

M Language Date functions

M Language DateTime functions

M Language Duration functions

 

There is a “gotcha,” though: The calculation results can be imported into Excel as numbers as long as they are not dates prior to 1900. Any dates in that range need to be converted to the data type text in Power Query before loading the data into Excel, or Excel will not recognize them.

Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia Alves

Discrepancies in dates between Excel and Power Query (not just for dates prior to 1900)

The process of dealing with dates with Power Query looks pretty straightforward in the example above. However, if you need to work with dates prior to 1900, there is a lot more to be aware of.

If we load into Power Query the dates we had entered in Excel as Text at the beginning of this post, this is what happens:

Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia Alves
  • Dates before January 1, 1900 are recognized correctly by Power Query.
  • The Serial Date value determined by Excel to match “January 1, 1900” is converted by Power Query to one day before (December 31, 1899). The same happens to the dates up to February 28, 1900. This has to do with that error of considering 1900 as a leap year mentioned above. VBA behaves the same way as Power Query and does this type of adjustment as Charley Kyd explains here.
  • The dates after December 31, 9999 are not recognized by Power Query.
  • When we load the dates back into Excel, maintaining the second column with data type date, the dates that had been recognized as prior to January 1, 1900 are loaded incorrectly as “1900-01-00.”
Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia Alves

If we load into Power Query the dates we had entered in Excel as serial numbers  at the beginning of this post, this is what happens: 

Dealing with Dates prior to 1900 in Excel with Power Query | Solve and Excel - Celia Alves
  • Serial date values for dates after December 31, 9999 produce an overflow error in Power Query.
  • After we eliminate those values from the data set, Power Query recognizes dates prior to January 1, 1900 using the same approach explained above, where the date serial value 1 is matched to December 31, 1899 instead of January 1, 1900.
  • When we load the dates back into Excel, maintaining the second column with data type date, we will see a mismatch between the date assigned by Power Query and the date assigned by Excel for date serial numbers up to 60. These include not only all the dates prior to 1900, but also all the dates from January 1, 1900 to February 28, 1900.

Both Power Query and VBAdate serial numbering system begin one day earlier than Excel, on December 31, 1899 instead of January 1, 1900. This allows to exclude February 29, 1900 as a possible date and have Power Query and VBAdate serial numbers matching Excelfrom March 1, 1900 onwards. 

A difference between VBA and Power Query date serial numbering systems seems to be that VBA can track dates back to January 1, 0100, and Power Query can convert dates preceding that date. 

However, it is interesting to see that when new load this data into Excel, dates between January 1, 0100 and December 31, 1899 come with a negative serial date number assigned to thembut the dates before January 1, 0100 are loaded as blank cells into Excel and are not flagged as errors by Power Query. 

In summary, if you need to work with dates before March 1, 1900 in Excel and Power Query, always be mindful and very and cautious about: 

  • the dates that you enter in Excel,  
  • how you enter the dates (typing text, typing serial numbers, copying and pasting from other sources, etc.,)  
  • the number format applied to the cell before and after you enter your data in Excel, 
  • the transformations that Power Query applies to your data. 

When loading the results from Power Query into Excel, always convert date values to data type text in Power Query before loading the query results to Excel.

Other surprising things I learned about Dates 

When I started this post, I thought it would be a concise one, something like “Hey, look at that! Power Query converts dates prior to 1900 correctly!” Then, as I researched and experimented more to validate my theory, I discovered that historians, scientists, and professionals dealing with old dates in Excel have a tough job operating with those date values. They cannot rely on a simple DATEDIF function in Excel to calculate date intervals. Plus, they have to account for several other external factors that might have impacted how the date was recorded and how it should now be interpreted.

As Charley Kyd explains, today, we use the Gregorian calendar in most countries, but in the past, most of the Western world used regional variations of the Julian calendar. The conversion from Julian to Gregorian calendar occurred in different moments for different regions over several centuries. These episodes make some dates not exist in certain regions, and some “ships’ logs could show that they arrived at one port before they departed their previous port.

The video below shows the exercise of importing and cleaning the data about the US presidencies from Wikipedia into Excel using Power Query, step by step. At time stamp 1:12:58 starts the discussion about needing to convert dates earlier than 1900 to Text in Power Query before loading them to Excel.

References:

Other content about this topic: