Star size distribution: A correction
I recently posted about the size distribution of stars found in the HYG database. Since then I've explored the data further, and stumbled across a feature of the dataset which impacted the results. This is a nice way of saying that I made a mistake and got the results wrong. Don't trust everything you read on the Internet.
While I originally obtained a distribution looking like this:
the correct distribution would look like this:As you can see, the two low peaks in the right part of the diagram vanished completely. This brings down the median and mean values, and places the arithmetic mean much nearer to the median and the geometric mean, invalidating some of the points I made in my earlier post. What happened?
The HYG database contains the available data on a large number of stars, but unfortunately not everything is known about many stars. While measuring the apparent brightness of a star is fairly simple (just point your telescope at it and count the photons), determining the distance can be much more tricky.
A common way to measure distance is using the parallax method. This method uses the apparent yearly motion of the star in the sky relative to more distant stars as seen from Earth, due to Earth's movement around the Sun. The greater the distance to a star, the smaller its parallax. For very distant stars the parallax can thus be extremely low and difficult to measure accurately, and the uncertainty in the parallax measurement is often comparable to or even larger than the parallax measured. In such cases astronomers are unable to reliably determine the distance using the parallax method.
However, such stars are not excluded from the HYG database; instead they are designated by having their distance set to 105 parsec, a value much larger than the furthest actual distance in the catalogue at around 103 parsec. The distribution of distances in the original dataset is shown in the plot below, clearly demonstrating that the stars with a distance of 105 parsec are not part of the natural distribution, but rather an artefact in the data.
My mistake in the original analysis was to not exclude this catagory of stars. Stars in such a huge distance would need to be extremely luminous in order to be visible even with powerful telescopes, and so the database reports very large values for the stars' luminosities, which translates to large radii. Variations in temperature smooth out the distribution of radii obtained. This explains the peaks around 103-104 $R_{\odot}$ in the original size distribution plot.
Excluding these stars gives this distribution of distances instead:
While this definitely seems more natural, one might still notice a conspicious cutoff at 103 parsec, the cause of which is likely to be artifical. The cutoff corresponds roughly to the current upper limits to distance measurements using the parallax method, so a plausible explanation would be that distances larger than 103 parsec are not determined sufficiently reliably to be included (i.e., they would be included in the "105 parsec"-category).
Since this artefact is much less extreme than the former, and as extraneous data are much easier to correct for than missing data, I think that the present state of the data is good enough for further analysis - with the caveat that it covers only stars within a distance of 103 parsec. I will follow up soon with one or more posts discussing this analysis, including explanations for the distribution of star sizes.
Comments
Post a Comment