Page tree
Skip to end of metadata
Go to start of metadata

WG Participant List

WG Scope

Naming authorities =
- testing, validation and quality control.

WG Proposed Outcome

  • make sure quality allows automated processing, content review
  • testing suites in repositories
  • opening/access to metadata

WG Pages

Naming Authority Discussion

  • No labels

3 Comments

  1. Dear SPASE Registry WG,


    During our last couple of SMWT meetings, we've been developing slowly a "recipe" for assigning NamingAuthroities (NAs) for SPASE descriptors of heliophysics data resources. I'm providing that recipe here as a starting point for the WG to consider:

    Possible scheme for assigning NAs for observational data resources:

    For space-based resources, use…

    • Use major oversight/funding agency (NASA, ESA, CNES, JAXA,…) for resources from missions that fall well within agency management;
    • Use instrument PI institution or it’s logical oversight agency for resources from joint space mission projects, e.g., SOHO, Cluster, etc.;

     

    For ground-based resources, use…

    • Use major oversight/funding agency (e.g., NSF) for resources from observatories or instruments (e.g., DKIST) that fall well within agency management;
    • Use project name as NA for major community-based data resources, such as ASWS, ISWI, IUGONET, Madrigal, SuperDARN, SuperMAG, GONG,…etc.
    • Use project name for multiagency-funded projects (e.g., Nançay)
    • Use instrument PI institution if none of the above applies.


    Application starts from the top bullet to the bottom bullet in succession, stopping at the most logical level at which resource provenance can be assigned. The NA does not have to be the responsible entity that maintains the SPASE documents for the resources in question because that function could be assigned to a third party (e.g., the SMWT). 


    Please know also that while provenance applies only to inanimate objects like data resources, observatories, and instruments. It does not apply to people, who may only have affiliations, ORCIDs, etc.  The SMWT thus recommends that SMWG be retained as the NA for Person descriptions.


    In order to ensure progress can be made on the NA issue at a reasonable rate, so other work that depends on it can move forward, please try to provide your input by 12/31/2020.


    Cheers,

    Shing 


  2. Hi Shing, 

    The proposed scheme (and the one currently in use) is embedding metadata into the Resource ID. This is fine in a controlled and limited environment (e.g., NASA context only, as it was at the beginning).

    When several several entities start implementing a SPASE registry tree for their own resource and creating SPASE Resource records, then the problems pop up. We discussed briefly this issue at the last telecon: how to discover resources registered in the CNES/CDPP tree ? how to keep the NSSDC archive tree up-to-date ? What about ground based observatories ?

    There are 2 questions: 

    • the search (how to find a resource?)
    • the maintenance (how to update a resource?)

    We are dealing with the 2nd bullet here: the maintenance of the resource descriptors (this includes the initial release, i.e., is the "naming" step). The various SPASE resources are not all equivalent, and are managed by different entities. There are "generic" resources (e.g., Person), there are resources managed by agencies (e.g., Observatory, Instrument...), and other by data centres (Repository, NumericalData, DisplayData...).

    This step of mapping of the maintenance responsibilities to the resources is required, however, the mapping will not be enough to decide, since some agencies will not register nor maintain there resource, although those observational resources can be referred to in NumericalData or DisplayData...

    Moreover, for resources derived from several data products (from various instruments or observatories), there can be several funding agencies, and thus the "funding-agency based" naming authority might be an issue. 

    An obvious option is then to set the root element of the SPASE resource ID to the entity who released and maintains the resource. This implies that naming authorities should then mainly be data centres (or managed by data centres). In other words, a data centre can not create resources names outside their naming scope (i.e., their tree).  

    I would advocate for adoption this solution, which would not change too much the current setup, and would imply to rely on metadata management authorities, rather than on "oversight, funding or PI" agencies. The SPASE trees would still be managed though the HPDE Github repository, so that they are accessible and findable. 

    If this scheme seems promising enough, we could try to draft a registry reorganisation path. 

    PS: Lastly, I come back to the first question listed at the beginning: there is a need for a search engine, since data of interest will appear in several SPASE trees.

  3. Hi Baptiste,

    Thanks for your comments, which I think relate to several intertwined issues. I see no easy way to refer and respond to them in a convenient way without losing their context, so I'll just reproduce them below and provide my responses in bold italics in order. Regular texts are your original comments.

    --------

    The proposed scheme (and the one currently in use) is embedding metadata into the Resource ID. This is fine in a controlled and limited environment (e.g., NASA context only, as it was at the beginning).

    What is the scheme currently in use? Yes, NA is part of a SPASE ResourceID, but there are problems with how NA is currently defined and used in practice, such as in SMWG. As you pointed out, more problems arise if/when we generalize the use of NA as currently defined and used to non-NASA data products. 


    When several entities start implementing a SPASE registry tree for their own resource and creating SPASE Resource records, then the problems pop up. We discussed briefly this issue at the last telecon: how to discover resources registered in the CNES/CDPP tree ? how to keep the NSSDC archive tree up-to-date ? What about ground based observatories ?

    Before I respond to your first question, I'd like to first point out that NA is not an effective search term for looking for data to support science analysis. Observatory and MeasurementType are more appropriate for that type of resource discovery. Rather, NA is more suitable for searching for the inventory of data that come under the jurisdiction/control of the given NA. So, the response to your resource discovery question is context-dependent. More specifically, if a given SPASE tree is constructed properly, i.e., if the relationships between a NA and the products under its management control are well-defined, then the entire data inventory under that given NA should be easily identifiable and discovered. In the case of CNES vs. CDPP, I would prefer to use CNES for the NA as it has the ultimate management authority. In the proposed NA model, the NA doesn't have to be the entity that creates and maintains the SPASE metadata; so CDPP can still act as the designated representative of CNES to maintain the SPASE metadata they have created. By the same token, another organization can do the same, under CNES.

     For Q2, all NAs (or their designees) are responsible for maintaining all the SAPSE metadata in their own trees. The ASWS is a good example.

    For Q3, GBO itself is a problematic NA, which is why we're trying to tackle this problem.


    There are 2 questions: 

    • the search (how to find a resource?)

    This is somewhat addressed above. It is context-dependent.

    • the maintenance (how to update a resource?)

    See my response to Q2 above. Or am I still missing something?


    We are dealing with the 2nd bullet here: the maintenance of the resource descriptors (this includes the initial release, i.e., is the "naming" step). The various SPASE resources are not all equivalent, and are managed by different entities. There are "generic" resources (e.g., Person), there are resources managed by agencies (e.g., Observatory, Instrument...), and other by data centres (Repository, NumericalData, DisplayData...).

    The issue you raised here doesn't seem to relate to how NA is assigned, so it is a different topic. But yes, the designation of a NA is important and is needed even for the creation of a ResourceID. In fact we (the SMWT) are in the process of migrating a lot of the NAs registered previously in the Git Registry, from VxOs to VSPO to NASA. We're also working on identifying all the non-NASA products (registered previously under various VxOs and CDAWeb products under NASA) and trying to put them back under their proper NAs. This means that we're also creating new NAs as needed.

    The ultimate goal here is to have all the data products (Numerical, Display, Catalogs,...etc) put under their appropriate NAs, with each NA heading a SPASE tree. This can be done for all inanimate objects like data, observatory, instrument, etc; and the SPASE model can handle all these information sufficiently well. Person is the only resource type that doesn't fit the NA model. The SMWT is suggesting to keep SMWG as the NA for Person information.



    This step of mapping of the maintenance responsibilities to the resources is required, however, the mapping will not be enough to decide, since some agencies will not register nor maintain there resource, although those observational resources can be referred to in NumericalData or DisplayData...

    I don't quite follow what you meant by "mapping of maintenance responsibilities".  It seems to be related to the situation mentioned above where CNES would designate CDPP to be the entity responsible for the SPASE metadata that CDPP has created (on behalf or on the behest of CNES). The chain of management control/authority and designation of representative is critical for ensuring the proper maintenance of metadata within a SPASE tree.  If a SPASE registry (tree) fails, then it would be the agency with the ultimate authority to provide the resources to fix things. Or their resources would be not easily discoverable.  This of course assumes that the SPASE tree is shared openly. If not, then it doesn't matter.


    Moreover, for resources derived from several data products (from various instruments or observatories), there can be several funding agencies, and thus the "funding-agency based" naming authority might be an issue. 

    Whoever sponsored (e.g., proposal funding agency) the creation of the derived data product or the PI institution of the funded research project (as the designed representative of the funding agency) would be the NA for the derived product. An interesting question here is: what would be the proper values for Observatory and Instrument if the product involves data from different platforms?

    An obvious option is then to set the root element of the SPASE resource ID to the entity who released and maintains the resource. This implies that naming authorities should then mainly be data centres (or managed by data centres). In other words, a data centre can not create resources names outside their naming scope (i.e., their tree).  

    But a data center can in fact create data products. The OMNI dataset was created originally by the NSSDC and now maintained by the SPDF.  


    I would advocate for adoption this solution, which would not change too much the current setup, and would imply to rely on metadata management authorities, rather than on "oversight, funding or PI" agencies. The SPASE trees would still be managed though the HPDE Github repository, so that they are accessible and findable. 

    If this scheme seems promising enough, we could try to draft a registry reorganisation path. 


    I'm not quite following what you're saying here. What solution are you advocating? The one proposed above or something else?


    PS: Lastly, I come back to the first question listed at the beginning: there is a need for a search engine, since data of interest will appear in several SPASE trees.

    Yes, but search engines are not the same as NA designation protocol and SPASE Registry structure. 


    Cheers,

    Shing